32 research outputs found
Evaluating and Automating the Annotation of a Learner Corpus
The paper describes a corpus of texts produced by non-native speakersof Czech. We discuss its annotation scheme, consisting of three interlinked tiers,designed to handle a wide range of error types present in the input. Each tier correctsdifferent types of errors; links between the tiers allow capturing errors in word orderand complex discontinuous expressions. Errors are not only corrected, but alsoclassified. The annotation scheme is tested on a data set including approx. 175,000words with fair inter-annotator agreement results. We also explore the possibility ofapplying automated linguistic annotation tools (taggers, spell checkers and grammarcheckers) to the learner text to support or even substitute manual annotation