496 research outputs found
The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
The Parallel Meaning Bank is a corpus of translations annotated with shared,
formal meaning representations comprising over 11 million words divided over
four languages (English, German, Italian, and Dutch). Our approach is based on
cross-lingual projection: automatically produced (and manually corrected)
semantic annotations for English sentences are mapped onto their word-aligned
translations, assuming that the translations are meaning-preserving. The
semantic annotation consists of five main steps: (i) segmentation of the text
in sentences and lexical items; (ii) syntactic parsing with Combinatory
Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and
(v) compositional semantic analysis based on Discourse Representation Theory.
These steps are performed using statistical models trained in a semi-supervised
manner. The employed annotation models are all language-neutral. Our first
results are promising.Comment: To appear at EACL 201
Are web corpora inferior? The Case of Czech and Slovak
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ― inferior, but rather ― different
DepAnn - An Annotation Tool for Dependency Treebanks
DepAnn is an interactive annotation tool for dependency treebanks, providing
both graphical and text-based annotation interfaces. The tool is aimed for
semi-automatic creation of treebanks. It aids the manual inspection and
correction of automatically created parses, making the annotation process
faster and less error-prone. A novel feature of the tool is that it enables the
user to view outputs from several parsers as the basis for creating the final
tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general
encoding format for both, representing the parser outputs and saving the
annotated treebank. The tool includes an automatic consistency checker for
sentence structures. In addition, the tool enables users to build structures
manually, add comments on the annotations, modify the tagsets, and mark
sentences for further revision
- …