Search CORE

40,313 research outputs found

Capturing translational divergences with a statistical tree-to-tree aligner

Author: Hearne Mary
Tinsley John
Way Andy
Zhechev Ventsislav
Publication venue
Publication date: 01/01/2007
Field of study

Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful for many applications, particularly data-driven machine translation. In this paper, we focus on how translational divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT

CiteSeerX

Irish Universities

DCU Online Research Access Service

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

Author: Nielsen Frank
Nock Richard
Publication venue
Publication date: 01/01/2014
Field of study

We present a generic dynamic programming method to compute the optimal clustering of

n

scalar elements into

k

pairwise disjoint intervals. This case includes 1D Euclidean

k

-means,

k

-medoids,

k

-medians,

k

-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate

k

by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Identifying Semantic Divergences in Parallel Text without Annotations

Author: Carpuat Marine
Niu Xing
Vyas Yogarshi
Publication venue
Publication date: 01/01/2018
Field of study

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.Comment: Accepted as a full paper to NAACL 201

arXiv.org e-Print Archive

Crossref