69 research outputs found

    Using images to improve machine-translating E-commerce product listings

    Get PDF
    In this paper we study the impact of using images to machine-translate user-generated ecommerce product listings. We study how a multi-modal Neural Machine Translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product listings often do not constitute grammatical or well-formed sentences. More often than not, they consist of the juxtaposition of short phrases or keywords. We train our models end-to-end as well as use text-only and multimodal NMT models for re-ranking n-best lists generated by an SMT model. We qualitatively evaluate our user-generated training data also analyse how adding synthetic data impacts the results. We evaluate our models quantitatively using BLEU and TER and find that (i) additional synthetic data has a general positive impact on text-only and multi-modal NMT models, and that (ii) using a multi-modal NMT model for re-ranking n-best lists improves TER significantly across different nbest list sizes

    Combining natural language processing systems to improve machine translation of speech

    No full text
    Machine translation of spoken language is a challenging task that involves several natural language processing (NLP) software modules. Human speech in one natural language has to be first automatically transcribed by a speech recognition system. Next, the transcription of the spoken utterance can be translated into another natural language by a machine translation system. In addition, it may be necessary to automatically insert sentence boundaries and punctuation marks. In recent years, a tremendous progress in improving the quality of automatic speech translation could be observed. In particular, statistical approaches to both speech recognition and machine translation have proved to be effective on a large number of translation tasks with both small and large vocabularies. Nevertheless, many unsolved problems remain. In particular, the systems involved in speech translation are often developed and optimized independently of each other. The goal of this thesis is to improve speech translation quality by enhancing the interface between various statistical NLP systems involved in the task of speech translation. The whole pipeline is considered: automatic speech recognition (ASR); automatic sentence segmentation and prediction of punctuation marks; machine translation (MT) using several systems which take either single best or multiple ASR hypotheses as input and employ different translation models; combination of the output of different MT systems. The coupling between the various components is reached through combination of model scores and/or hypotheses, development of new and modifications of existing algorithms to handle ambiguous input or to meet the constraints of the downstream components, as well as through optimization of model parameters with the aim of improving the final translation quality. The main focus of the thesis is on a tighter coupling between speech recognition and machine translation. To this end, two phrase-based MT systems based on two different statistical models are extended to process ambiguous ASR output in the form of word lattices. A novel algorithm for lattice-based translation is proposed that allows for exhaustive, but efficient phrase-level reordering in the search. Experimental results show that significant improvements in translation quality can be obtained by avoiding hard decisions in the ASR system and choosing the path in the lattice with the most likely translation according to the combination of recognition and translation model scores. The conditions under which these improvements are to be expected are identified in numerous experiments on several small and large vocabulary MT tasks. Another important part of this work is combination of multiple MT systems. Different MT systems tend to make different errors. To take advantage of this fact, a method for computing a consensus translation from the outputs of several MT systems is proposed. In this approach, a consensus translation is computed on the word level and includes a novel statistical approach for aligning and reordering the translation hypotheses so that a confusion network for weighted majority voting can be created. A consensus translation is expected to contain words and phrases on which several systems agree and which therefore have a high probability of being correct. In the application to speech translation, the goal can be to combine MT systems which translate only the single best ASR output and those systems which can translate word lattices. The proposed system combination method resulted in highly significant improvements in translation quality over the best single system on a multitude of text and speech translation tasks. Many of these improvements were obtained in official and highly competitive evaluation campaigns, in which the quality of the translations was evaluated using both automatic error measures and human judgment

    Combining natural language processing systems to improve machine translation of speech

    Get PDF
    Machine translation of spoken language is a challenging task that involves several natural language processing (NLP) software modules. Human speech in one natural language has to be first automatically transcribed by a speech recognition system. Next, the transcription of the spoken utterance can be translated into another natural language by a machine translation system. In addition, it may be necessary to automatically insert sentence boundaries and punctuation marks. In recent years, a tremendous progress in improving the quality of automatic speech translation could be observed. In particular, statistical approaches to both speech recognition and machine translation have proved to be effective on a large number of translation tasks with both small and large vocabularies. Nevertheless, many unsolved problems remain. In particular, the systems involved in speech translation are often developed and optimized independently of each other. The goal of this thesis is to improve speech translation quality by enhancing the interface between various statistical NLP systems involved in the task of speech translation. The whole pipeline is considered: automatic speech recognition (ASR); automatic sentence segmentation and prediction of punctuation marks; machine translation (MT) using several systems which take either single best or multiple ASR hypotheses as input and employ different translation models; combination of the output of different MT systems. The coupling between the various components is reached through combination of model scores and/or hypotheses, development of new and modifications of existing algorithms to handle ambiguous input or to meet the constraints of the downstream components, as well as through optimization of model parameters with the aim of improving the final translation quality. The main focus of the thesis is on a tighter coupling between speech recognition and machine translation. To this end, two phrase-based MT systems based on two different statistical models are extended to process ambiguous ASR output in the form of word lattices. A novel algorithm for lattice-based translation is proposed that allows for exhaustive, but efficient phrase-level reordering in the search. Experimental results show that significant improvements in translation quality can be obtained by avoiding hard decisions in the ASR system and choosing the path in the lattice with the most likely translation according to the combination of recognition and translation model scores. The conditions under which these improvements are to be expected are identified in numerous experiments on several small and large vocabulary MT tasks. Another important part of this work is combination of multiple MT systems. Different MT systems tend to make different errors. To take advantage of this fact, a method for computing a consensus translation from the outputs of several MT systems is proposed. In this approach, a consensus translation is computed on the word level and includes a novel statistical approach for aligning and reordering the translation hypotheses so that a confusion network for weighted majority voting can be created. A consensus translation is expected to contain words and phrases on which several systems agree and which therefore have a high probability of being correct. In the application to speech translation, the goal can be to combine MT systems which translate only the single best ASR output and those systems which can translate word lattices. The proposed system combination method resulted in highly significant improvements in translation quality over the best single system on a multitude of text and speech translation tasks. Many of these improvements were obtained in official and highly competitive evaluation campaigns, in which the quality of the translations was evaluated using both automatic error measures and human judgment
    • …
    corecore