104 research outputs found
Language modeling and transcription of the TED corpus lectures
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved
Optimized MT Online Learning in Computer Assisted Translation
In this paper we propose a cascading framework for optimizing online learning in
machine translation for computer assisted translation scenario.
With the use of online learning, one introduces several hyper parameters
associated with the learning algorithm.
Number of iterations of online learning can affect the quality of translation as
well.
We discuss these issues and propose a few approaches that can be used to
optimize the hyper parameters and also to find the number of iterations required
for online learning.
We experimentally show that using optimal number of iterations in online
learning proves to be useful and we get consistent improvement against baseline
results
Adattamento al Progetto dei Modelli di Traduzione Automatica nella Traduzione Assistita
L'integrazione della traduzione automatica nei sistemi di traduzione assistita è una sfida sia per la ricerca accademica sia per quella industriale. Infatti, i traduttori professionisti percepiscono come cruciale l'abilità dei sistemi automatici di adattarsi al loro stile e alle loro correzioni. In questo articolo proponiamo uno schema di adattamento dei sistemi di traduzione automatica ad uno specifico documento sulla base di una limitata quantità di testo, corretto manualmente, pari a quella prodotta giornalmente da un singolo traduttore
Project Adaptation for MT-Enhanced Computer Assisted Translation
The effective integration of MT technology into CAT tools is a challenging topic both for academic research and the translation industry. Particularly, professional translators feel crucial the ability of MT systems to adapt to their feedback. In this paper, we propose an adaptation scheme to tune a statistical MT system to a translation project using small amounts of post-edited texts. By running field tests on two domains with 8 professional translators working with a CAT tool, productivity gains up to over 20% were measured after applying MT project adaptation
Bootstrapping Arabic-Italian SMT through Comparable Texts and Pivot Translation
This paper describes efforts towards the development of an Arabic to Italian
SMT system for the news domain. Since only very little parallel data are available
for this language pair, we investigated both the exploitation of comparable corpora
and pivot translation. Experimental evaluation was conducted on a new benchmark
developed by extending two Arabic-to-English NIST evaluation sets with Italian and French
translations, produced from the source language by experts. Preliminary results show potentials of
both approaches with respect to performance achieved by a popular state-of-the-art
Web-based translation service
- …