Search CORE

3,837 research outputs found

The TALP–UPC Spanish–English WMT biomedical task: bilingual embeddings and char-based neural language model rescoring in a phrase-based system

Author: Escolano Peinado Carlos
España-i-Bonet Cristina
Madhyastha Pranava
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

This paper describes the TALP–UPC system in the Spanish–English WMT 2016 biomedical shared task. Our system is a standard phrase-based system enhanced with vocabulary expansion using bilingual word embeddings and a characterbased neural language model with rescoring. The former focuses on resolving outof- vocabulary words, while the latter enhances the fluency of the system. The two modules progressively improve the final translation as measured by a combination of several lexical metrics.Postprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Word-to-Word Models of Translational Equivalence

Author: Melamed I. Dan
Publication venue
Publication date: 01/01/1997
Field of study

Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties. Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models. The prediction is confirmed by evaluation with respect to a gold standard -- translation models that are biased in this fashion are significantly more accurate than a baseline knowledge-poor model. This article also shows how a statistical translation model can take advantage of various kinds of pre-existing knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, is shown to reliably boost translation model performance on some tasks. Statistical models that are informed by pre-existing knowledge about the model domain combine the best of both the rationalist and empiricist traditions

arXiv.org e-Print Archive

CiteSeerX

TExSIS: bilingual terminology extraction from parallel corpora using chunk-based alignment

Author: Hoste Veronique
Lefever Els
Macken Lieve
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2013
Field of study

Crossref

Ghent University Academic Bibliography

Archivsystem Ask23