40,313 research outputs found
Capturing translational divergences with a statistical tree-to-tree aligner
Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful
for many applications, particularly data-driven machine translation. In this paper, we focus on how translational
divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We
observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments
is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision
rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in
direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only
for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT
Optimal interval clustering: Application to Bregman clustering and statistical mixture learning
We present a generic dynamic programming method to compute the optimal
clustering of scalar elements into pairwise disjoint intervals. This
case includes 1D Euclidean -means, -medoids, -medians, -centers,
etc. We extend the method to incorporate cluster size constraints and show how
to choose the appropriate by model selection. Finally, we illustrate and
refine the method on two case studies: Bregman clustering and statistical
mixture learning maximizing the complete likelihood.Comment: 10 pages, 3 figure
Identifying Semantic Divergences in Parallel Text without Annotations
Recognizing that even correct translations are not always semantically
equivalent, we automatically detect meaning divergences in parallel sentence
pairs with a deep neural model of bilingual semantic similarity which can be
trained for any parallel corpus without any manual annotation. We show that our
semantic model detects divergences more accurately than models based on surface
features derived from word alignments, and that these divergences matter for
neural machine translation.Comment: Accepted as a full paper to NAACL 201
- …