Search CORE

69 research outputs found

Using images to improve machine-translating E-commerce product listings

Author: Calixto Iacer
Castilho Sheila
Lohar Pintu
Matusov Evgeny
Stein Daniel
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

In this paper we study the impact of using images to machine-translate user-generated ecommerce product listings. We study how a multi-modal Neural Machine Translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product listings often do not constitute grammatical or well-formed sentences. More often than not, they consist of the juxtaposition of short phrases or keywords. We train our models end-to-end as well as use text-only and multimodal NMT models for re-ranking n-best lists generated by an SMT model. We qualitatively evaluate our user-generated training data also analyse how adding synthetic data impacts the results. We evaluate our models quantitatively using BLEU and TER and find that (i) additional synthetic data has a general positive impact on text-only and multi-modal NMT models, and that (ii) using a multi-modal NMT model for re-ranking n-best lists improves TER significantly across different nbest list sizes

Irish Universities

DCU Online Research Access Service

generating e commerce product titles and predicting their quality

Author: Ernie Chang
Evgeny Matusov
José G. Camargo de Souza
Marco Guerini
Marco Turchi
Matteo Negri
Michael Kozielski
Prashant Mathur
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Open Access Repository

Исследование противомикробной активности ди(имидазол-1-ил)алканов и их производных

Author: Favre Benoit
Grishman Ralph
Hakkani-Tur Dilek
Harper Mary
Hillard Dustin
Hirschberg Julia
Ji Heng
Kahn Jeremy G.
Liu Yang
Maskey Sameer
Matusov Evgeny
Ney Hermann
Ostendorf Mari
Rosenberg Andrew
Shriberg Elizabeth
Wang Wen
Wooter Chuck
Publication venue: Изд-во ТПУ
Publication date: 01/01/2008
Field of study

Electronic archive of Tomsk Polytechnic University

Statistical machine translation of spontaneous speech with scarce resources

Author: Matusov Evgeny
Popovic Maja
Publication venue
Publication date: 01/01/2004
Field of study

Publikationsserver der RWTH Aachen University

Combining natural language processing systems to improve machine translation of speech

Author: Matusov Evgeny
Publication venue: Publikationsserver der RWTH Aachen University
Publication date: 01/01/2009
Field of study

Machine translation of spoken language is a challenging task that involves several natural language processing (NLP) software modules. Human speech in one natural language has to be first automatically transcribed by a speech recognition system. Next, the transcription of the spoken utterance can be translated into another natural language by a machine translation system. In addition, it may be necessary to automatically insert sentence boundaries and punctuation marks. In recent years, a tremendous progress in improving the quality of automatic speech translation could be observed. In particular, statistical approaches to both speech recognition and machine translation have proved to be effective on a large number of translation tasks with both small and large vocabularies. Nevertheless, many unsolved problems remain. In particular, the systems involved in speech translation are often developed and optimized independently of each other. The goal of this thesis is to improve speech translation quality by enhancing the interface between various statistical NLP systems involved in the task of speech translation. The whole pipeline is considered: automatic speech recognition (ASR); automatic sentence segmentation and prediction of punctuation marks; machine translation (MT) using several systems which take either single best or multiple ASR hypotheses as input and employ different translation models; combination of the output of different MT systems. The coupling between the various components is reached through combination of model scores and/or hypotheses, development of new and modifications of existing algorithms to handle ambiguous input or to meet the constraints of the downstream components, as well as through optimization of model parameters with the aim of improving the final translation quality. The main focus of the thesis is on a tighter coupling between speech recognition and machine translation. To this end, two phrase-based MT systems based on two different statistical models are extended to process ambiguous ASR output in the form of word lattices. A novel algorithm for lattice-based translation is proposed that allows for exhaustive, but efficient phrase-level reordering in the search. Experimental results show that significant improvements in translation quality can be obtained by avoiding hard decisions in the ASR system and choosing the path in the lattice with the most likely translation according to the combination of recognition and translation model scores. The conditions under which these improvements are to be expected are identified in numerous experiments on several small and large vocabulary MT tasks. Another important part of this work is combination of multiple MT systems. Different MT systems tend to make different errors. To take advantage of this fact, a method for computing a consensus translation from the outputs of several MT systems is proposed. In this approach, a consensus translation is computed on the word level and includes a novel statistical approach for aligning and reordering the translation hypotheses so that a confusion network for weighted majority voting can be created. A consensus translation is expected to contain words and phrases on which several systems agree and which therefore have a high probability of being correct. In the application to speech translation, the goal can be to combine MT systems which translate only the single best ASR output and those systems which can translate word lattices. The proposed system combination method resulted in highly significant improvements in translation quality over the best single system on a multitude of text and speech translation tasks. Many of these improvements were obtained in official and highly competitive evaluation campaigns, in which the quality of the translations was evaluated using both automatic error measures and human judgment

Combining natural language processing systems to improve machine translation of speech

Author: Matusov Evgeny
Publication venue: Publikationsserver der RWTH Aachen University
Publication date: 01/01/2009
Field of study

Publikationsserver der RWTH Aachen University