Corpus ORWELL

This corpus was created as part of the EU Multext-East project and it is formed by the text of George Orwell 's novel 1984 (from the English original translated by Eva Šimečková; Prague: Naše vojsko,1991). The corpus contains c. 80 thousand words and 20 thousand punctuation marks, that is approximately 100 thousand of corpus positions and it is morphologically tagged. The relatively small size of this corpus allowed the hand-correction of mistakes, which were created during the automatic morphological analysis, which means it is almost flawlessly tagged.