100 research outputs found

    Language modeling and transcription of the TED corpus lectures

    Get PDF
    Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved

    Optimized MT Online Learning in Computer Assisted Translation

    Get PDF
    In this paper we propose a cascading framework for optimizing online learning in machine translation for computer assisted translation scenario. With the use of online learning, one introduces several hyper parameters associated with the learning algorithm. Number of iterations of online learning can affect the quality of translation as well. We discuss these issues and propose a few approaches that can be used to optimize the hyper parameters and also to find the number of iterations required for online learning. We experimentally show that using optimal number of iterations in online learning proves to be useful and we get consistent improvement against baseline results

    Adattamento al Progetto dei Modelli di Traduzione Automatica nella Traduzione Assistita

    Get PDF
    L'integrazione della traduzione automatica nei sistemi di traduzione assistita è una sfida sia per la ricerca accademica sia per quella industriale. Infatti, i traduttori professionisti percepiscono come cruciale l'abilità dei sistemi automatici di adattarsi al loro stile e alle loro correzioni. In questo articolo proponiamo uno schema di adattamento dei sistemi di traduzione automatica ad uno specifico documento sulla base di una limitata quantità di testo, corretto manualmente, pari a quella prodotta giornalmente da un singolo traduttore

    Bootstrapping Arabic-Italian SMT through Comparable Texts and Pivot Translation

    Get PDF
    This paper describes efforts towards the development of an Arabic to Italian SMT system for the news domain. Since only very little parallel data are available for this language pair, we investigated both the exploitation of comparable corpora and pivot translation. Experimental evaluation was conducted on a new benchmark developed by extending two Arabic-to-English NIST evaluation sets with Italian and French translations, produced from the source language by experts. Preliminary results show potentials of both approaches with respect to performance achieved by a popular state-of-the-art Web-based translation service

    Project Adaptation for MT-Enhanced Computer Assisted Translation

    Get PDF
    The effective integration of MT technology into CAT tools is a challenging topic both for academic research and the translation industry. Particularly, professional translators feel crucial the ability of MT systems to adapt to their feedback. In this paper, we propose an adaptation scheme to tune a statistical MT system to a translation project using small amounts of post-edited texts. By running field tests on two domains with 8 professional translators working with a CAT tool, productivity gains up to over 20% were measured after applying MT project adaptation

    Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation

    Get PDF
    The integration of machine translation in the human translation work flow rises intriguing and challenging research issues. One of them, addressed in this work, is how to dynamically adapt phrase-based statistical MT from user post-editing. By casting the problem in the online machine learning paradigm, we propose a cache-based adaptation technique method that dynamically stores target n-gram and phrase-pair features used by the translator. For the sake of adaptation, during decoding not only recency of the features stored in the cache is rewarded but also their occurrence in similar already translated sentences in the document. Our experimental results show the effectiveness of the devised method both on standard benchmarks and on documents post-edited by professional translators through the real use of the MateCat tool
    • …
    corecore