Search CORE

6 research outputs found

UPM system for the translation task

Author: Lopez Ludeña Veronica
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2011
Field of study

This paper describes the UPM system for translation task at the EMNLP 2011 workshop on statistical machine translation (http://www.statmt.org/wmt11/), and it has been used for both directions: Spanish-English and English-Spanish. This system is based on Moses with two new modules for pre and post processing the sentences. The main contribution is the method proposed (based on the similarity with the source language test set) for selecting the sentences for training the models and adjusting the weights. With system, we have obtained a 23.2 BLEU for Spanish-English and 21.7 BLEU for EnglishSpanis

Archivo Digital UPM

Sentence selection for improving the tuning process of a statistical machine translation system

Author: Lopez Ludeña Veronica
Lorenzo Trueba Jaime
Montero Martínez Juan Manuel
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2012
Field of study

Este artículo describe una estrategia de selección de frases para hacer el ajuste de un sistema de traducción estadístico basado en el decodificador Moses que traduce del español al inglés. En este trabajo proponemos dos posibilidades para realizar esta selección de las frases del corpus de validación que más se parecen a las frases que queremos traducir (frases de test en lengua origen). Con esta selección podemos obtener unos mejores pesos de los modelos para emplearlos después en el proceso de traducción y, por tanto, mejorar los resultados. Concretamente, con el método de selección basado en la medida de similitud propuesta en este artículo, mejoramos la medida BLEU del 27,17% con el corpus de validación completo al 27,27% seleccionando las frases para el ajuste. Estos resultados se acercan a los del experimento ORACLE: se utilizan las mismas frases de test para hacer el ajuste de los pesos. En este caso, el BLEU obtenido es de 27,51%

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Findings of the 2011 Workshop on Statistical Machine Translation

Author: Callison-Burch Chris
Koehn Philipp
Monz Christof
Zaidan Omar
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the aftermath of the Haitian earthquake. We also conducted a pilot 'tunable metrics' task to test whether optimizing a fixed system to different metrics would result in perceptibly different translation quality

Edinburgh Research Explorer

International Migration, Integration and Social Cohesion online publications

Extracting Parallel Corpora from Wikipedia on the basis of Phrase Level Bilingual Alignment

Author: Barrón Cedeño Luis Alberto
Civera Saiz Jorge
Garcia Martinez Maria Mercedes
Rosso . Paolo
Silvestre Cerdà Joan Albert
Publication venue: CEUR Workshop Proceedings
Publication date: 01/01/2011
Field of study

[EN] This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging.[ES] Este art'¿culo presenta una nueva t'ecnica de extracci'on de corpus paralelos de la Wikipedia mediante la aplicaci'on de t'ecnicas de traducci'on autom'atica estad'¿stica. En concreto, se han utilizado los modelos de alineamiento basados en palabras de IBM para obtener alineamientos biling¿ues a nivel de frase entre pares de documentos. Para su evaluaci'on se ha generado manualmente un conjunto de test formado por pares de documentos ingl'es-espa¿nol, obteni'endose resultados prometedores.Este trabajo se ha llevado a cabo en el marco del VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems, financiado parcialmente por parte de la EC (FEDER/FSE; WIQEI IRSES no. 269180 / FP 7 Marie Curie People), por el MICINN como parte del proyecto Text-Enterprise 2.0 (TIN2009-13391-C04-03) en el Plan I+D+i, y por la beca 192021 del CONACyT. Tambi´en ha recibido apoyo por parte del EC (FEDER/FSE) y del MEC/MICINN bajo el programa MIPRCV “Consolider Ingenio 2010” (CSD2007-00018) y el proyecto iTrans2 (TIN2009-14511), por el MITyC en el marco del proyecto erudito.com (TSI-020110-2009-439), por la Generalitat Valenciana con las ayudas Prometeo/2009/014 y GV/2010/067, y por el “Vicerrectorado de Investigaci´on de la UPV” con la ayuda 20091027.Silvestre Cerdà, JA.; Garcia Martinez, MM.; Barrón Cedeño, LA.; Civera Saiz, J.; Rosso ., P. (2011). Extracción de Corpus Paralelos de la Wikipedia basada en la Obtención de Alineamientos Bilingües a Nivel de Frase. CEUR Workshop Proceedings. 824:14-21. http://hdl.handle.net/10251/27930S142182

RiuNet