3,345 research outputs found

    MLQE-PE : a multilingual quality estimation and post-editing dataset

    Get PDF
    We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text

    Kvaliteedi hindamine tähelepanu abil

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneMasintõlge on saanud osaks mitte ainult keeleteadlaste ja professionaalsete tõlkijate, vaid peaaegu kõigi elust. Enamik inimesi, kes on kasutanud masintõlget, on kohanud naljakaid ja kohati täiesti valesid tõlkeid, mis lause tähendust täielikult moonutavad. Seega peame peale masintõlke mudeli kasutama hindamismehhanismi, mis teavitab inimesi tõlgete kvaliteedist. Loomulikult saavad professionaalsed tõlkijad masintõlke väljundit hinnata ja vajadusel toimetada. Inimeste märkuste kasutamine veebipõhiste masintõlkesüsteemide tõlgete hindamiseks on aga äärmiselt kulukas ja ebapraktiline. Seetõttu on automatiseeritud tõlkekvaliteedi hindamise süsteemid masintõlke töövoo oluline osa. Kvaliteedihinnangu eesmärk on ennustada masintõlke väljundi kvaliteeti, ilma etalontõlgeteta. Selles töös keskendusime kvaliteedihinnangu mõõdikutele ja käsitleme tõlkekvaliteedi näitajana tähelepanumehhanismi ennustatud jaotusi, mis on üks kaasaegsete neuromasintõlke (NMT) süsteemide sisemistest parameetritest. Kõigepealt rakendasime seda rekurrentsetel närvivõrkudel (RNN) põhinevatele masintõlkemudelitele ja analüüsisime pakutud meetodite toimivust juhendamata ja juhendatud ülesannete jaoks. Kuna RNN-põhised MT-süsteemid on nüüdseks asendunud transformeritega, mis muutusid peamiseks tipptaseme masintõlke tehnoloogiaks, kohandasime oma lähenemisviisi ka transformeri arhitektuurile. Näitasime, et tähelepanupõhised meetodid sobivad nii juhendatud kui ka juhendamata ülesannete jaoks, kuigi teatud piirangutega. Kuna annotatsiooni andmete hankimine on üsna kulukas, uurisime, kui palju annoteeritud andmeid on vaja kvaliteedihinnangu mudeli treenimiseks.Machine translation has become a part of the life of not only linguists and professional translators, but almost everyone. Most people who have used machine translation have come across funny and sometimes completely incorrect translations that turn the meaning of a sentence upside down. Thus, apart from a machine translation model, we need to use a scoring mechanism that informs people about the quality of translations. Of course, professional translators can assess and, if necessary, edit the machine translation output. However, using human annotations to evaluate translations of online machine translation systems is extremely expensive and impractical. That is why automated systems for measuring translation quality are a crucial part of the machine translation pipeline. Quality Estimation aims to predict the quality of machine translation output at run-time without using any gold-standard human annotations. In this work, we focused on Quality Estimation methods and explored the distribution of attention—one of the internal parameters of modern neural machine translation systems—as an indicator of translation quality. We first applied it to machine translation models based on recurrent neural networks (RNNs) and analyzed the performance of proposed methods for unsupervised and supervised tasks. Since transformer-based machine translation models had supplanted RNN-based, we adapted our approach to the attention extracted from transformers. We demonstrated that attention-based methods are suitable for both supervised and unsupervised tasks, albeit with some limitations. Since getting annotation labels is quite expensive, we looked at how much annotated data is needed to train a quality estimation model.https://www.ester.ee/record=b549935

    An Analysis of Source-Side Grammatical Errors in NMT

    Full text link
    The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise. We present the first large-scale study of state-of-the-art English-to-German NMT on real grammatical noise, by evaluating on several Grammar Correction corpora. We present methods for evaluating NMT robustness without true references, and we use them for extensive analysis of the effects that different grammatical errors have on the NMT output. We also introduce a technique for visualizing the divergence distribution caused by a source-side error, which allows for additional insights.Comment: Accepted and to be presented at BlackboxNLP 201

    MLQE-PE : a multilingual quality estimation and post-editing dataset

    Get PDF
    We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text

    Unsupervised quality estimation for neural machine translation

    Get PDF
    Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By employing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE

    Computational Models of Concept Similarity for the Estonian Language

    Get PDF
    Käesoleva bakalaureusetöö eesmärk on testida ja võrrelda erinevaid arvutuslikke mudeleid nende oskuse põhjal hinnata mõistete ja sõnade vahelist sarnasust. Mudelite hinnaguid võrreldakse inimeste hinnangutega. Selleks, et mudelite võimekust hinnata, luuakse uus eestikeelne andmekogu, mis sisaldab sõnapaare ja inimeste poolt annoteeritud sarnasuse hinnanguid. Töös hinnatakse kolme eri kategooriasse kuuluvaid arvutuslikke mudeleid: distributiivseid mudeleid, semantilisi võrke ja tehisnägemise mudeleid. Saadud tulemusi saab kasutada tulevaste mudelite hindamiseks.The purpose of this thesis is to test and compare different computational models of similarity for the Estonian language. Models' predictions for words and concepts similarity is usually compared against human predictions. To make such comparisons between models' similarity estimates and human scores, a proper human annotated data set had to be created for the Estonian language. The SimLex-999 data set was chosen for translation into Estonian. This resource is used to test three families of computational models of similarity: distributional models, semantic networks and computer vision models. The results of this thesis can be used to evaluate future similarity models

    Multi-hypothesis machine translation evaluation

    Get PDF
    Reliably evaluating Machine Translation (MT) through automated metrics is a long-standing problem. One of the main challenges is the fact that multiple outputs can be equally valid. Attempts to minimise this issue include metrics that relax the matching of MT output and reference strings, and the use of multiple references. The latter has been shown to significantly improve the performance of evaluation metrics. However, collecting multiple references is expensive and in practice a single reference is generally used. In this paper, we propose an alternative approach: instead of modelling linguistic variation in human reference we exploit the MT model uncertainty to generate multiple diverse translations and use these: (i) as surrogates to reference translations; (ii) to obtain a quantification of translation variability to either complement existing metric scores or (iii) replace references altogether. We show that for a number of popular evaluation metrics our variability estimates lead to substantial improvements in correlation with human judgements of quality by up 15%
    corecore