8 research outputs found

    Referential translation machines for predicting translation quality

    Get PDF
    We use referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13). We improve our RTM models with the Parallel FDA5 instance selection model, with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT14 QET (QET14) subtask, obtain improvements over QET13 results, and rank 11st in all of the tasks and subtasks of QET14

    Referential translation machines for predicting translation quality and related statistics

    Get PDF
    We use referential translation machines (RTMs) for predicting translation performance. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. We improve our RTM models with the ParFDA instance selection model (Bicici et al., 2015), with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT15 QET (QET15) subtask and obtain improvements over QET14 results. RTMs achieve top performance in QET15 ranking 1st in document- and sentence-level prediction tasks and 2nd in word-level prediction task

    Correlations of perceived post-editing effort with measurements of actual effort

    Get PDF
    Human rating of predicted post-editing effort is a common activity and has been used to train confidence estimation models. However, the correlation between human ratings and actual post-editing effort is under-measured. Moreover, the impact of presenting effort indicators in a post-editing user interface on actual post-editing effort has hardly been researched. In this study, ratings of perceived post-editing effort are tested for correlations with actual temporal, technical and cognitive post-editing effort. In addition, the impact on post-editing effort of the presentation of post-editing effort indicators in the user interface is also tested. The language pair involved in this study is English-Brazilian Portuguese. Our findings, based on a small sample, suggest that there is little agreement between raters for predicted post-editing effort and that the correlations between actual post-editing effort and predicted effort are only moderate, and thus an inefficient basis for MT confidence estimation. Moreover, the presentation of post-editing effort indicators in the user interface appears not to impact on actual post-editing effort

    Comparative Quality Estimation for Machine Translation. An Application of Artificial Intelligence on Language Technology using Machine Learning of Human Preferences

    Get PDF
    In this thesis we focus on Comparative Quality Estimation, as the automaticprocess of analysing two or more translations produced by a Machine Translation(MT) system and expressing a judgment about their comparison. We approach theproblem from a supervised machine learning perspective, with the aim to learnfrom human preferences. As a result, we create the ranking mechanism, a pipelinethat includes the necessary tasks for ordering several MT outputs of a givensource sentence in terms of relative quality. Quality Estimation models are trained to statistically associate the judgmentswith some qualitative features. For this purpose, we design a broad set offeatures with a particular focus on the ones with a grammatical background.Through an iterative feature engineering process, we investigate several featuresets, we conclude to the ones that achieve the best performance and we proceedto linguistically intuitive observations about the contribution of individualfeatures. Additionally, we employ several feature selection and machine learning methodsto take advantage of these features. We suggest the usage of binary classifiersafter decomposing the ranking into pairwise decisions. In order to reduce theamount of uncertain decisions (ties) we weight the pairwise decisions with theirclassification probability. Through a set of experiments, we show that the ranking mechanism can learn andreproduce rankings that correlate to the ones given by humans. Most importantly,it can be successfully compared with state-of-the-art reference-aware metricsand other known ranking methods for several language pairs. We also apply thismethod for a hybrid MT system combination and we show that it is able to improvethe overall translation performance. Finally, we examine the correlation between common MT errors and decoding eventsof the phrase-based statistical MT systems. Through evidence from the decodingprocess, we identify some cases where long-distance grammatical phenomena cannotbe captured properly. An additional outcome of this thesis is the open source software Qualitative,which implements the full pipeline of ranking mechanism and the systemcombination task. It integrates a multitude of state-of-the-art natural languageprocessing tools and can support the development of new models. Apart from theusage in experiment pipelines, it can serve as an application back-end for webapplications in real-use scenaria.In dieser Promotionsarbeit konzentrieren wir uns auf die vergleichende Qualitätsschätzung der Maschinellen Übersetzung als ein automatisches Verfahren zur Analyse von zwei oder mehr Übersetzungen, die von Maschinenübersetzungssysteme erzeugt wurden, und zur Beurteilung von deren Vergleich. Wir gehen an das Problem aus der Perspektive des überwachten maschinellen Lernens heran, mit dem Ziel, von menschlichen Präferenzen zu lernen. Als Ergebnis erstellen wir einen Ranking-Mechanismus. Dabei handelt es sich um eine Pipeline, welche die notwendigen Arbeitsschritte für die Anordnung mehrerer Maschinenübersetzungen eines bestimmten Quellsatzes in Bezug auf die relative Qualität umfasst. Qualitätsschätzungsmodelle werden so trainiert, dass Vergleichsurteile mit einigen bestimmten Merkmalen statistisch verknüpft werden. Zu diesem Zweck konzipieren wir eine breite Palette von Merkmalen mit besonderem Fokus auf diejenigen mit einem grammatikalischen Hintergrund. Mit Hilfe eines iterativen Verfahrens der Merkmalskonstruktion untersuchen wir verschiedene Merkmalsreihen, erschließen diejenigen, die die beste Leistung erzielen, und leiten linguistisch motivierte Beobachtungen über die Beiträge der einzelnen Merkmale ab. Zusätzlich setzen wir verschiedene Methoden des maschinellen Lernens und der Merkmalsauswahl ein, um die Vorteile dieser Merkmale zu nutzen. Wir schlagen die Verwendung von binären Klassifikatoren nach Zerlegen des Rankings in paarweise Entscheidungen vor. Um die Anzahl der unklaren Entscheidungen (Unentschieden) zu verringern, gewichten wir die paarweisen Entscheidungen mit deren Klassifikationswahrscheinlichkeit. Mithilfe einer Reihe von Experimenten zeigen wir, dass der Ranking-Mechanismus Rankings lernen und reproduzieren kann, die mit denen von Menschen übereinstimmen. Die wichtigste Erkenntnis ist, dass der Mechanismus erfolgreich mit referenzbasierten Metriken und anderen bekannten Ranking-Methoden auf dem neusten Stand der Technik für verschiedene Sprachpaare verglichen werden kann. Diese Methode verwenden wir ebenfalls für eine hybride Systemkombination maschineller Übersetzer und zeigen, dass sie in der Lage ist, die gesamte Übersetzungsleistung zu verbessern. Abschließend untersuchen wir den Zusammenhang zwischen häufig vorkommenden Fehlern der maschinellen Übersetzung und Vorgängen, die während des internen Dekodierungsverfahrens der phrasenbasierten statistischen Maschinenübersetzungssysteme ablaufen. Durch Beweise aus dem Dekodierungsverfahren können wir einige Fälle identifizieren, in denen grammatikalische Phänomene mit Fernabhängigkeit nicht richtig erfasst werden können. Ein weiteres Ergebnis dieser Arbeit ist die quelloffene Software ``Qualitative'', welche die volle Pipeline des Ranking-Mechanismus und das System für die Kombinationsaufgabe implementiert. Die Software integriert eine Vielzahl modernster Softwaretools für die Verarbeitung natürlicher Sprache und kann die Entwicklung neuer Modelle unterstützen. Sie kann sowohl in Experimentierpipelines als auch als Anwendungs-Backend in realen Nutzungsszenarien verwendet werden

    The role of syntax and semantics in machine translation and quality estimation of machine-translated user-generated content

    Get PDF
    The availability of the Internet has led to a steady increase in the volume of online user-generated content, the majority of which is in English. Machine-translating this content to other languages can help disseminate the information contained in it to a broader audience. However, reliably publishing these translations requires a prior estimate of their quality. This thesis is concerned with the statistical machine translation of Symantec's Norton forum content, focusing in particular on its quality estimation (QE) using syntactic and semantic information. We compare the output of phrase-based and syntax-based English-to-French and English-to-German machine translation (MT) systems automatically and manually, and nd that the syntax-based methods do not necessarily handle grammar-related phenomena in translation better than the phrase-based methods. Although these systems generate suciently dierent outputs, the apparent lack of a systematic dierence between these outputs impedes its utilisation in a combination framework. To investigate the role of syntax and semantics in quality estimation of machine translation, we create SymForum, a data set containing French machine translations of English sentences from Norton forum content, their post-edits and their adequacy and uency scores. We use syntax in quality estimation via tree kernels, hand-crafted features and their combination, and nd it useful both alone and in combination with surface-driven features. Our analyses show that neither the accuracy of the syntactic parses used by these systems nor the parsing quality of the MT output aect QE performance. We also nd that adding more structure to French Treebank parse trees can be useful for syntax-based QE. We use semantic role labelling (SRL) for our semantic-based QE experiments. We experiment with the limited resources that are available for French and nd that a small manually annotated training set is substantially more useful than a much larger articially created set. We use SRL in quality estimation using tree kernels, hand-crafted features and their combination. Additionally, we introduce PAM, a QE metric based on the predicate-argument structure match between source and target. We nd that the SRL quality, especially on the target side, is the major factor negatively aecting the performance of the semantic-based QE. Finally, we annotate English and French Norton forum sentences with their phrase structure syntax using an annotation strategy adapted for user-generated text. We nd that user errors occur in only a small fraction of the data, but their correction does improve parsing performance. These treebanks (Foreebank) prove to be useful as supplementary training data in adapting the parsers to the forum text. The improved parses ultimately increase the performance of the semantic-based QE. However, a reliable semantic-based QE system requires further improvements in the quality of the underlying semantic role labelling

    Ti plasmids

    No full text

    Referential Translation Machines for Predicting Translation Quality and Related Statistics

    No full text
    Abstract We use referential translation machines (RTMs) for predicting translation performance. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. We improve our RTM models with the ParFDA instance selection model , with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT15 QET (QET15) subtask and obtain improvements over QET14 results. RTMs achieve top performance in QET15 ranking 1st in document-and sentence-level prediction tasks and 2nd in word-level prediction task. Referential Translation Machine (RTM) Referential translation machines are a computational model effectively judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants. RTMs achieve top performance in automatic, accurate, and language independent prediction of machine translation performance and reduce our dependence on any task dependent resource. Prediction of translation performance can help in estimating the effort required for correcting the translations during post-editing by human translators. We improve our RTM models (Biçici and Way, 2014): • by using improved ParFDA instance selection model allowing better language models (LM) in which similarity judgments are made to be built with improved optimization and selection of the LM data, • by selecting TreeF features over source and translation data jointly instead of taking their intersection, • with extended learning models including bayesian ridge regression We present top results with Referential Translation Machines Every act of communication is an act of translation (Bliss, 2012). Figure 1 depicts RTM. Our encouraging results in QET provides a greater understanding of the acts of translation we ubiquitously use and how they can be used to predict the performance of translation. RTMs are powerful enough to be applicable in different domains and tasks while achieving top performance. 30
    corecore