1,989 research outputs found

    As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning

    Get PDF
    Omission and addition of content is a typical issue in neural machine translation. We propose a method for detecting such phenomena with off-the-shelf translation models. Using contrastive conditioning, we compare the likelihood of a full sequence under a translation model to the likelihood of its parts, given the corresponding source or target sequence. This allows to pinpoint superfluous words in the translation and untranslated words in the source even in the absence of a reference translation. The accuracy of our method is comparable to a supervised method that requires a custom quality estimation model.Comment: ACL 202

    Mašininio vertimo kokybė vertimo programėlėse su integruotu vaizdo atpažinimu

    Get PDF
    With the advancement of mobile applications, now it is possible to perform instant text translation using a smartphone’s camera. Because text translation within images is still a relatively new field of research, it is not surprising that the translation quality of these mobile applications is under-researched. This study aims to determine the image-to-text translation quality in the English to Lithuanian language direction using popular machine translation apps. To classify errors and evaluate the quality of translation, the present study adopts and customizes the Multidimensional Quality Metrics (MQM) framework (Lommel 2014). The obtained results indicate that image-to-text machine translation apps produce exceptionally low-quality translations for the English-Lithuanian language pair. Therefore, the quality of machine translation for low-resource languages such as Lithuanian remains an issue.Šiandien naujausiomis technologijomis grįstos vertimo programėlės su integruotu vaizdo atpažinimu suteikia galimybę išmaniuoju telefonu aptikti tekstą vaizde ir jį greitai išversti į norimą užsienio kalbą. Teksto vertimas vaizde yra dar visai nauja mokslinių tyrimų kryptis, tad šių mobiliųjų programėlių vertimo kokybė yra nepakankamai ištirta. Šio darbo objektas yra tekstų, išverstų pasitelkiant populiariąsias programėles su integruotu vaizdo atpažinimu, vertimo kokybė. Vertimo atlikto iš anglų kalbos į lietuvių kalbą su vaizdo atpažinimą integruojančiomis mašininio vertimo programėlėmis klaidų analizei pasirinkta adaptuota daugiamatė kokybės vertinimo sistema (angl. Multidimensional Quality Metrics) klasifikacija. Apibendrinus rezultatus, galima teigti, kad ištirtų vaizdo atpažinimą integruojančių programėlių vertimo iš anglų kalbos į lietuvių kalbą kokybė buvo itin prasta

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning

    Get PDF
    For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [1] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%–13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology
    corecore