1,543 research outputs found

    A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines

    Get PDF
    In this paper, we report an analysis of the strengths and weaknesses of several Machine Translation (MT) engines implementing the three most widely used paradigms. The analysis is based on a manually built test suite that comprises a large range of linguistic phenomena. Two main observations are on the one hand the striking improvement of an commercial online system when turning from a phrase-based to a neural engine and on the other hand that the successful translations of neural MT systems sometimes bear resemblance with the translations of a rule-based MT system

    Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies

    Get PDF
    Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and phrase-based MT systems. Experiments are reported on a large database of 180 million words. Results, in terms of standard automatic measures, show that neural MT clearly outperforms the rule-based and phrase-based MT system on in-domain test set, but it is worst in the out-of-domain test set. A naive system combination specially works for the latter. In-domain manual analysis shows that neural MT tends to improve both adequacy and fluency, for example, by being able to generate more natural translations instead of literal ones, choosing to the adequate target word when the source word has several translations and improving gender agreement. However, out-of-domain manual analysis shows how neural MT is more affected by unknown words or contexts.Postprint (published version

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

    TermEval: an automatic metric for evaluating terminology translation in MT

    Get PDF
    Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain-knowledge from source to target is arguably the most concerning factor for the customers in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. However, evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem, which could aid the end-users to instantly identify term translation problems in MT. In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold standard evaluation test set, we semi-automatically create a gold-standard dataset from English--Hindi judicial domain parallel corpus. We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations

    An investigation of challenges in machine translation of literary texts : the case of the English–Chinese language pair

    Get PDF
    In the absence of a focus on literary text translation in studies of machine translation (MT), this study aims at investigating some challenges of this application of the technology. First, the most commonly used types of MT are reviewed in chronological order of their development, and, for the purpose of identifying challenges for MT in literary text translation, the challenges human translators face in literary text translation are linked to corresponding aspects of MT. In investigating the research questions of the challenges that MT systems face in literary text translation, and whether equivalence can be established by MT in literary text translation, a qualitative method is used. Areas such as the challenges for MT in the establishment of corpora, achieving equivalence, and realisation of creativity in literary texts are examined in order to reveal some of the potential contributing factors to the difficulties faced in literary text translation by MT. Through text analysis on chosen sample literary texts on three online MT platforms (Google Translate, DeepL and Youdao Translate), all based on highly advanced neural machine translation engines, this study offers a pragmatic view on some challenging areas in literary text translation using these widely acclaimed online platforms, and offers insights on potential research opportunities in studies of literary text translation using MT

    Corpus-Based Machine Translation : A Study Case for the e-Government of Costa Rica Corpus-Based Machine Translation: A Study Case for the e-Government of Costa Rica

    Get PDF
    Esta investigación pretende estudiar el estado del arte en las tecnologías de la traducción automática. Se explorará la teoría fundamental de los sistemas estadísticos basados en frases (PB-SMT) y neuronales (NMT): su arquitectura y funcionamiento. Luego, nos concentraremos en un caso de estudio que pondrá a prueba la capacidad del traductor para aprovechar al máximo el potencial de estas tecnologías. Este caso de estudio incita al traductor a poner en práctica todos sus conocimientos y habilidades profesionales para llevar a cabo la preparación de datos, entrenamiento, evaluación y ajuste de los motores.This research paper aims to approach the state-of-the-art technologies in machine translation. Following an overview of the architecture and mechanisms underpinning PB-SMT and NMT systems, we will focus on a specific use-case that would attest the translator's agency at maximizing the cutting-edge potential of these technologies, particularly the PB-SMT's capacity. The use-case urges the translator to dig out of his/her toolbox the best practices possible to improve the translation output text by means of data preparation, training, assessment and refinement tasks

    Suitability of Neural Machine Translation for Different Types of Texts : A Study on Potential Predictors

    Get PDF
    Tutkielmassa tarkastellaan erilaisten tekstien soveltuvuutta neuroverkkokonekääntämiselle. Tutkimus pyrkii löytämään kielellisiä indikaattoreita, joita voidaan käyttää ennustamaan, onko jokin tietty teksti soveltuva neuroverkkokonekääntämiselle vai ei. Koska aihetta ei ole vielä tutkittu laajasti, tutkimuksessa esitetään myös erilaisia tutkimustapoja, joilla aihetta voisi tutkia. Tutkielman teoriatausta muodostuu tekstityyppien tutkimuksesta ja neuroverkkokonekääntämisestä. Lähdekirjallisuuden perusteella soveltuvimmaksi tekstityyppiluokitteluksi nousee Biberin viisi dimensiota, joita käytetään materiaalivalinnassa ja joiden yhteyksiä käännöslaadun kanssa tarkastellaan analyysin aikana. Neuroverkkokonekääntämisen osalta esitellään lyhyesti neuroverkkokääntimien eroavaisuuksia aiempiin kääntimiin, neuroverkkokäänninten perusrakennetta sekä niille tyypillisesti vaikeita kielellisiä elementtejä. Tutkielmassa käytetään materiaalina kolmea eri korpusta, jotka ovat fiktio, viralliset kirjeet ja viralliset dokumentit. Kukin korpus koostuu alkuperäisestä englanninkielisestä lähtötekstistä, suomenkielisestä ihmisen tekemästä referenssikäännöksestä sekä kahden neuroverkkokonekääntimen käännöksestä. Korpukset analysoidaan automaattisella evaluaatiolla ja kustakin korpuksesta otetaan pienempi otos, jolle tehdään manuaalinen virhekategorisointi. Näin tutkimus vertaa erityyppisten tekstien konekäännösten laatua toisiinsa ja tutkii, onko käännöksissä tapahtuneiden virheiden välillä merkittäviä eroja erilaisten tekstien sekä kahden kääntimen välillä. Tekstityyppien lisäksi tutkimuksessa tarkastellaan lausepituuden suhdetta käännöslaatuun, joka on yksi lähdekirjallisuudessa havaituista käännöslaatuun vaikuttavista tekstuaalisista piirteistä. Tutkielmassa käytettyjen kolmen korpuksen perusteella selviää, että Biberin dimensioista narratiiviset tekstit näyttäisivät olevan huonommin soveltuvia neuroverkkokonekääntämiselle kuin ei-narratiiviset ja että kontekstisidonnaiset tekstit olisivat huonommin soveltuvia kuin eksplisiittiset. Fiktiokorpuksen virhejakauma eroaa eniten kahden muun tuloksista, mutta tutkielmassa käytetty materiaali havaitaan mahdollisesti ongelmalliseksi. Konekäänninten välillä havaitaan joitain eroja, mutta niiden syitä on vaikea arvioida tuntematta tarkemmin kääntimien rakenteita. Lausepituusanalyysin perusteella lyhyempiä lauseita voidaan käyttää yhden korpuksen sisällä ennustamaan tulosta, mutta korpusten välinen vertailu ei ole mahdollista ja äärimmäisen lyhyet lauseet saattavat olla muista syistä ongelmallisia. Analyysin perusteella päätellään, että Biberin tapaista kielellisiin piirteisiin perustuvaa tekstityyppiluokitusta voidaan jossain määrin käyttää ennustamaan erilaisten tekstien soveltuvuutta neuroverkkokonekääntämiselle, joskin lisätutkimusta vaadittaisiin asian kattavaan kartoitukseen. Tutkimuksessa käytetyt menetelmät havaitaan pääasiassa hyviksi asian tutkimiselle, joskin virheluokitteluun esitetään pientä tarkennusta
    corecore