Search CORE

13 research outputs found

Findings of the 2015 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Chatterjee Rajen
Federmann Christian
Haddow Barry
Hokamp Chris
Huck Matthias
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Scarton Carolina
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Findings of the 2014 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Buck Christian
Federmann Christian
Haddow Barry
Koehn Philipp
Leveling Johannes
Monz Christof
Pecina Pavel
Post Matt
Saint-Amand Herve
Soricut Radu
Specia Lucia
Tamchyna Ales
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

Crossref

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

International Migration, Integration and Social Cohesion online publications

Compositional Morphology for Word Representations and Language Modelling

Author: Blunsom Phil
Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning (ICML

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Findings of the 2016 Conference on Machine Translation.

Author: Bojar Ondˇrej
Chatterjee Rajen
Federmann Christian
Graham Yvette
Haddow Barry
Huck Matthias
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Neveol Aurelie
Neves Mariana
Popel Martin
Post Matt
Rubino Raphael
Scarton Carolina
Specia Lucia
Turchi Marco
Verspoor Karin
Yepes Antonio Jimeno
Zampieri Marcos
Publication venue: The Association for Computational Linguistics
Publication date
Field of study

This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries

Archivio della ricerca - Fondazione Bruno Kessler

Findings of the 2016 Conference on Machine Translation (WMT16)

Author: Bojar Ondrej
Chatterjee Rajen
Federmann Christian
Graham Yvette
Haddow Barry
Huck Matthias
Jimeno Yepes Antonio
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Neveol Aurelie
Neves Mariana
Popel Martin
Post Matt
Rubino Raphael
Scarton Carolina
Specia Lucia
Turchi Marco
Verspoor Karin
Zampieri Marcos
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

UvA-DARE

International Migration, Integration and Social Cohesion online publications

On understanding, modeling and predicting user behavior in web search

Author: Borisov A.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

On understanding, modeling and predicting user behavior in web search

Author: Borisov A.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Recommended from our members

Refinements in hierarchical phrase-based translation systems

Author: Pino Juan Miguel
Publication venue: University of Cambridge
Publication date: 07/04/2015
Field of study

The relatively recently proposed hierarchical phrase-based translation model for statistical machine translation (SMT) has achieved state-of-the-art performance in numerous recent translation evaluations. Hierarchical phrase-based systems comprise a pipeline of modules with complex interactions. In this thesis, we propose refinements to the hierarchical phrase-based model as well as improvements and analyses in various modules for hierarchical phrase-based systems. We took the opportunity of increasing amounts of available training data for machine translation as well as existing frameworks for distributed computing in order to build better infrastructure for extraction, estimation and retrieval of hierarchical phrase-based grammars. We design and implement grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting grammar using the HFile format, which offers competitive trade-offs in terms of efficiency and simplicity. We demonstrate improvements over two alternative solutions used in machine translation. The modular nature of the SMT pipeline, while allowing individual improvements, has the disadvantage that errors committed by one module are propagated to the next. This thesis alleviates this issue between the word alignment module and the grammar extraction and estimation module by considering richer statistics from word alignment models in extraction. We use alignment link and alignment phrase pair posterior probabilities for grammar extraction and estimation and demonstrate translation improvements in Chinese to English translation. This thesis also proposes refinements in grammar and language modelling both in the context of domain adaptation and in the context of the interaction between first-pass decoding and lattice rescoring. We analyse alternative strategies for grammar and language model cross-domain adaptation. We also study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methods for large 5-gram language model rescoring. The last two chapters are devoted to the application of phrase-based grammars to the string regeneration task, which we consider as a means to study the fluency of machine translation output. We design and implement a monolingual phrase-based decoder for string regeneration and achieve state-of-the-art performance on this task. By applying our decoder to the output of a hierarchical phrase-based translation system, we are able to recover the same level of translation quality as the translation system

Apollo (Cambridge)

Probabilistic Modelling of Morphologically Rich Languages

Author: Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

arXiv.org e-Print Archive

Oxford University Research Archive

Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

Author: Edman Lukas
Noord van, Gertjan
Toral Ruiz Antonio
Publication venue
Publication date: 01/01/2020
Field of study

ARTS repository - University of Groningen