11,754 research outputs found
Unsupervised generation of parallel treebanks through sub-tree alignment
The need for syntactically annotated data for use in natural language processing has increased dramatically
in recent years. This is true especially for parallel treebanks, of which very few exist. The ones
that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this
paper we introduce an open-source system for fast and robust automatic generation of parallel treebanks.
We expect the opening of the presented platform to the scientific community to help boost research
in the field of data-oriented machine translation and lead to advancements in other fields where
parallel treebanks can be employed
On Tree-Based Neural Sentence Modeling
Neural networks with tree-based sentence encoders have shown better results
on many downstream tasks. Most of existing tree-based encoders adopt syntactic
parsing trees as the explicit structure prior. To study the effectiveness of
different tree structures, we replace the parsing trees with trivial trees
(i.e., binary balanced tree, left-branching tree and right-branching tree) in
the encoders. Though trivial trees contain no syntactic information, those
encoders get competitive or even better results on all of the ten downstream
tasks we investigated. This surprising result indicates that explicit syntax
guidance may not be the main contributor to the superior performances of
tree-based neural sentence modeling. Further analysis show that tree modeling
gives better results when crucial words are closer to the final representation.
Additional experiments give more clues on how to design an effective tree-based
encoder. Our code is open-source and available at
https://github.com/ExplorerFreda/TreeEnc.Comment: To Appear at EMNLP 201
F-structure transfer-based statistical machine translation
In this paper, we describe a statistical deep syntactic transfer decoder that is trained fully automatically on parsed bilingual corpora. Deep syntactic transfer rules are induced automatically from the f-structures of a LFG parsed bitext corpus by automatically aligning local f-structures, and inducing all rules consistent with the node alignment. The transfer decoder outputs the n-best TL f-structures given a SL f-structure as input by applying large numbers of transfer rules and searching for the best output using a
log-linear model to combine feature scores. The decoder includes a fully integrated dependency-based tri-gram language model. We include an experimental evaluation of the decoder using different parsing disambiguation
resources for the German data to provide a comparison of how the system performs with different German training and test parses
Korean to English Translation Using Synchronous TAGs
It is often argued that accurate machine translation requires reference to
contextual knowledge for the correct treatment of linguistic phenomena such as
dropped arguments and accurate lexical selection. One of the historical
arguments in favor of the interlingua approach has been that, since it revolves
around a deep semantic representation, it is better able to handle the types of
linguistic phenomena that are seen as requiring a knowledge-based approach. In
this paper we present an alternative approach, exemplified by a prototype
system for machine translation of English and Korean which is implemented in
Synchronous TAGs. This approach is essentially transfer based, and uses
semantic feature unification for accurate lexical selection of polysemous
verbs. The same semantic features, when combined with a discourse model which
stores previously mentioned entities, can also be used for the recovery of
topicalized arguments. In this paper we concentrate on the translation of
Korean to English.Comment: ps file. 8 page
- âŠ