5,855 research outputs found
Detecting syntactic errors in dependency treebanks for morphosyntactically rich languages
Abstract. The paper introduces a new method for detecting and correcting errors in large dependency treebanks with rich morphosyntactic annotation. The technique uses error correction rules automatically extracted from the treebank. The procedure of rule extraction is based on a comparison of similar -but not identical -subgraphs of dependency structures. The outcome of applying the method to a 3-million-sentence dependency treebank of Polish is presented and evaluated. The method achieves satisfactory precision in the task of automatic error correction and relatively high precision in the task of error detection
DepAnn - An Annotation Tool for Dependency Treebanks
DepAnn is an interactive annotation tool for dependency treebanks, providing
both graphical and text-based annotation interfaces. The tool is aimed for
semi-automatic creation of treebanks. It aids the manual inspection and
correction of automatically created parses, making the annotation process
faster and less error-prone. A novel feature of the tool is that it enables the
user to view outputs from several parsers as the basis for creating the final
tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general
encoding format for both, representing the parser outputs and saving the
annotated treebank. The tool includes an automatic consistency checker for
sentence structures. In addition, the tool enables users to build structures
manually, add comments on the annotations, modify the tagsets, and mark
sentences for further revision
Irish treebanking and parsing: a preliminary evaluation
Language resources are essential for linguistic research and the development of NLP applications. Low- density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish – namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development
Active learning and the Irish treebank
We report on our ongoing work in developing the Irish Dependency Treebank, describe the results of two Inter annotator Agreement (IAA) studies, demonstrate improvements in annotation consistency which have a knock-on effect on parsing accuracy, and present the final set of dependency labels. We then go on to investigate the extent to which active learning can play a role in treebank and parser development by comparing an active learning bootstrapping approach to a passive approach in which sentences are chosen at random for manual revision. We show that active learning outperforms passive learning, but when annotation effort is taken into account, it is not clear how much of an advantage the active learning approach has. Finally, we present results which suggest that adding automatic parses to the training data along with manually revised parses in an active learning setup does not greatly affect parsing accuracy
- …