1,340,372 research outputs found
TermEval: an automatic metric for evaluating terminology translation in MT
Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain-knowledge from source to target is arguably the most concerning factor for the customers in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. However, evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem,
which could aid the end-users to instantly identify term translation problems in MT.
In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold standard evaluation test set, we semi-automatically create a gold-standard dataset from English--Hindi judicial domain parallel corpus.
We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations
Word-to-Word Models of Translational Equivalence
Parallel texts (bitexts) have properties that distinguish them from other
kinds of parallel data. First, most words translate to only one other word.
Second, bitext correspondence is noisy. This article presents methods for
biasing statistical translation models to reflect these properties. Analysis of
the expected behavior of these biases in the presence of sparse data predicts
that they will result in more accurate models. The prediction is confirmed by
evaluation with respect to a gold standard -- translation models that are
biased in this fashion are significantly more accurate than a baseline
knowledge-poor model. This article also shows how a statistical translation
model can take advantage of various kinds of pre-existing knowledge that might
be available about particular language pairs. Even the simplest kinds of
language-specific knowledge, such as the distinction between content words and
function words, is shown to reliably boost translation model performance on
some tasks. Statistical models that are informed by pre-existing knowledge
about the model domain combine the best of both the rationalist and empiricist
traditions
Integrating Translation Technology in the Specialised Translation Classroom to Contextualise Learning
Recent approaches to translation training have emphasized the need to include in the classroom a real working context and to promote situational learning (cf. Kelly, 2005: 16-18 ). For the specialised translator, new technologies and, consequentially, the instrumental-professional sub-competence, have become as important as linguistic-cultural knowledge. For this reason, in this contribution, a didactic proposal will be presented to incorporate new technologies (computer-assisted translation and localisation tools) in the scientific/technical translation classroom, as well as a proposal to coordinate different subjects in the curriculum to promote horizontality in contents.
Modality and type of translation are not mutually exclusive. For this reason, the main objective of this contribution is to merge both concepts with a learning proposal in which new technologies become another essential working tool in the specialised translation classroom. Nowadays, the meaning of localisation goes beyond the translation of software, video games and websites and it has caused important changes in the translation process and the translation industry. In Spain, it is compulsory to include in the Translation and Interpreting curriculum subjects such as documentation, terminology and computer science. These subjects are normally offered in the first years of the degree, when students have only basic knowledge of translation. If these skills are not put into practice in later years of the degree, students will not understand the operating principles of these tools. In this regard, it is essential that the different subjects in the curriculum are coordinated to ensure learning contextualisation and the employability of future graduates. Different activities directly related to computer-assisted translation and localisation will thus be presented in order to integrate and improve the knowledge acquired in previous years and new skills regarding specialised translation.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Hand in hand: automatic sign Language to English translation
In this paper, we describe the first data-driven automatic sign-language-to- speech translation system. While both sign language (SL) recognition and translation techniques exist, both use an intermediate notation system
not directly intelligible for untrained users. We combine a SL recognizing framework with a state-of-the-art phrase-based machine translation (MT) system, using corpora of both American Sign Language and Irish Sign Language
data. In a set of experiments we show the overall results and also illustrate the importance of including a
vision-based knowledge source in the development of a complete SL translation system
Translating Law into a Dictionary. A Terminographic Model
Firstly, the methodological approach which was adopted in order to create the model is delineated. It is based on the combined specialist knowledge of three disciplines, namely terminography, translation studies and law. Subsequently, the notion of the translation dictionary as a separate type of terminological dictionary is presented, with particular emphasis on the unit of translation and translation equivalence. The following part of the paper characterises translation of legal texts and its implications concerning the needs of the translator as well as the role of the dictionary in the translation process. Finally, the paper proposes a model dictionary, constructed according to the methodological rules determined at the beginning and in the light of the conclusions drawn from the following analysis
- …
