9 research outputs found
Building a resource for studying translation shifts
This paper describes an interdisciplinary approach which brings together the
fields of corpus linguistics and translation studies. It presents ongoing work
on the creation of a corpus resource in which translation shifts are explicitly
annotated. Translation shifts denote departures from formal correspondence
between source and target text, i.e. deviations that have occurred during the
translation process. A resource in which such shifts are annotated in a
systematic way will make it possible to study those phenomena that need to be
addressed if machine translation output is to resemble human translation. The
resource described in this paper contains English source texts (parliamentary
proceedings) and their German translations. The shift annotation is based on
predicate-argument structures and proceeds in two steps: first, predicates and
their arguments are annotated monolingually in a straightforward manner. Then,
the corresponding English and German predicates and arguments are aligned with
each other. Whenever a shift - mainly grammatical or semantic -has occurred,
the alignment is tagged accordingly.Comment: 6 pages, 1 figur
Parallel Aligned Treebank Corpora at LDC: Methodology, Annotation and Integration
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 14-23.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
Annotating a Parallel Monolingual Treebank with Semantic Similarity Relations
Proceedings of the Sixth International Workshop on Treebanks and
Linguistic Theories.
Editors: Koenraad De Smedt, Jan Hajič and Sandra Kübler.
NEALT Proceedings Series, Vol. 1 (2007), 85-96.
© 2007 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/4476
Building and querying parallel treebanks
This paper describes our work on building a trilingual parallel treebank. We have annotated constituent structure trees from three text genres (a philosophy novel, economy reports and a technical user manual). Our parallel treebank includes word and phrase alignments. The alignment information was manually checked using a graphical tool that allows the annotator to view a pair of trees from parallel sentences. This tool comes with a powerful search facility which supersedes the expressivity of previous popular treebank query engines
Treebanking in VIT: from Phrase Structure to Dependency Representation
In this chapter, we are dealing with treebanks and their applications. We describe VIT (Venice Italian Treebank), focusing on the syntactic-semantic features of the treebank that are partly dependent on the adopted tagset, partly on the reference linguistic theory, and, lastly - as in every treebank - on the chosen language: Italian. By discussing examples taken from treebanks available in other languages, we show the theoretical and practical differences and motivations that underlie our approach. Finally, we discuss the quantitative analysis of the data of our treebank and compare them to other treebanks. In general, we try to substantiate the claim that treebanking grammars or parsers strongly depend on the chosen treebank; and eventually this process seems to depend both on factors such as the adopted linguistic frame-work for structural description and, ultimately, the described language
XML-based phrase alignment in parallel treebanks
This paper describes the usage of XML for representing cross-language phrase alignments in parallel treebanks. We have developed a TreeAligner as a tool for interactively inserting and correcting such alignments as an independent level of treebank annotation
Proceedings
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 98 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893