40,313 research outputs found

    Capturing translational divergences with a statistical tree-to-tree aligner

    Get PDF
    Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful for many applications, particularly data-driven machine translation. In this paper, we focus on how translational divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT

    Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

    Full text link
    We present a generic dynamic programming method to compute the optimal clustering of nn scalar elements into kk pairwise disjoint intervals. This case includes 1D Euclidean kk-means, kk-medoids, kk-medians, kk-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate kk by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.Comment: 10 pages, 3 figure

    Identifying Semantic Divergences in Parallel Text without Annotations

    Full text link
    Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.Comment: Accepted as a full paper to NAACL 201
    corecore