5,266 research outputs found
Learning Bilingual Word Representations by Marginalizing Alignments
We present a probabilistic model that simultaneously learns alignments and
distributed representations for bilingual data. By marginalizing over word
alignments the model captures a larger semantic context than prior work relying
on hard alignments. The advantage of this approach is demonstrated in a
cross-lingual classification task, where we outperform the prior published
state of the art.Comment: Proceedings of ACL 2014 (Short Papers
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Identifying Semantic Divergences in Parallel Text without Annotations
Recognizing that even correct translations are not always semantically
equivalent, we automatically detect meaning divergences in parallel sentence
pairs with a deep neural model of bilingual semantic similarity which can be
trained for any parallel corpus without any manual annotation. We show that our
semantic model detects divergences more accurately than models based on surface
features derived from word alignments, and that these divergences matter for
neural machine translation.Comment: Accepted as a full paper to NAACL 201
Multilingual Models for Compositional Distributed Semantics
We present a novel technique for learning semantic representations, which
extends the distributional hypothesis to multilingual data and joint-space
embeddings. Our models leverage parallel data and learn to strongly align the
embeddings of semantically equivalent sentences, while maintaining sufficient
distance between those of dissimilar sentences. The models do not rely on word
alignments or any syntactic information and are successfully applied to a
number of diverse languages. We extend our approach to learn semantic
representations at the document level, too. We evaluate these models on two
cross-lingual document classification tasks, outperforming the prior state of
the art. Through qualitative analysis and the study of pivoting effects we
demonstrate that our representations are semantically plausible and can capture
semantic relationships across languages without parallel data.Comment: Proceedings of ACL 2014 (Long papers
- …