41 research outputs found

    End-to-End Speech Translation of Arabic to English Broadcast News

    Full text link
    Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift towards the end-to-end approaches using sequence-to-sequence deep neural network models. This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. Starting from independent ASR and MT LDC releases, we were able to identify about 92 hours of Arabic audio recordings for which the manual transcription was also translated into English at the segment level. These data was used to train and compare pipeline and end-to-end speech translation systems under multiple scenarios including transfer learning and data augmentation techniques.Comment: Arabic Natural Language Processing Workshop 202

    Neural Machine Translation by Generating Multiple Linguistic Factors

    Full text link
    Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). Moreover, we can produce grammatically correct words that are not part of the vocabulary. FNMT model is evaluated on IWSLT'15 English to French task and compared to the baseline word-based and BPE-based NMT systems. Promising qualitative and quantitative results (in terms of BLEU and METEOR) are reported.Comment: 11 pages, 3 figues, SLSP conferenc

    Attelage de systèmes de transcription automatique de la parole

    Get PDF
    Nous abordons, dans cette thèse, les méthodes de combinaison de systèmesde transcription de la parole à Large Vocabulaire. Notre étude se concentre surl attelage de systèmes de transcription hétérogènes dans l objectif d améliorerla qualité de la transcription à latence contrainte. Les systèmes statistiquessont affectés par les nombreuses variabilités qui caractérisent le signal dela parole. Un seul système n est généralement pas capable de modéliserl ensemble de ces variabilités. La combinaison de différents systèmes detranscription repose sur l idée d exploiter les points forts de chacun pourobtenir une transcription finale améliorée. Les méthodes de combinaisonproposées dans la littérature sont majoritairement appliquées a posteriori,dans une architecture de transcription multi-passes. Cela nécessite un tempsde latence considérable induit par le temps d attente requis avant l applicationde la combinaison.Récemment, une méthode de combinaison intégrée a été proposée. Cetteméthode est basée sur le paradigme de décodage guidé (DDA :Driven DecodingAlgorithm) qui permet de combiner différents systèmes durant le décodage. Laméthode consiste à intégrer des informations en provenance de plusieurs systèmes dits auxiliaires dans le processus de décodage d un système dit primaire.Notre contribution dans le cadre de cette thèse porte sur un double aspect : d une part, nous proposons une étude sur la robustesse de la combinaison par décodage guidé. Nous proposons ensuite, une amélioration efficacement généralisable basée sur le décodage guidé par sac de n-grammes,appelé BONG. D autre part, nous proposons un cadre permettant l attelagede plusieurs systèmes mono-passe pour la construction collaborative, à latenceréduite, de la sortie de l hypothèse de reconnaissance finale. Nous présentonsdifférents modèles théoriques de l architecture d attelage et nous exposons unexemple d implémentation en utilisant une architecture client/serveur distribuée. Après la définition de l architecture de collaboration, nous nous focalisons sur les méthodes de combinaison adaptées à la transcription automatiqueà latence réduite. Nous proposons une adaptation de la combinaison BONGpermettant la collaboration, à latence réduite, de plusieurs systèmes mono-passe fonctionnant en parallèle. Nous présentons également, une adaptationde la combinaison ROVER applicable durant le processus de décodage via unprocessus d alignement local suivi par un processus de vote basé sur la fréquence d apparition des mots. Les deux méthodes de combinaison proposéespermettent la réduction de la latence de la combinaison de plusieurs systèmesmono-passe avec un gain significatif du WER.This thesis presents work in the area of Large Vocabulary ContinuousSpeech Recognition (LVCSR) system combination. The thesis focuses onmethods for harnessing heterogeneous systems in order to increase theefficiency of speech recognizer with reduced latency.Automatic Speech Recognition (ASR) is affected by many variabilitiespresent in the speech signal, therefore single ASR systems are usually unableto deal with all these variabilities. Considering these limitations, combinationmethods are proposed as alternative strategies to improve recognitionaccuracy using multiple recognizers developed at different research siteswith different recognition strategies. System combination techniques areusually used within multi-passes ASR architecture. Outputs of two or moreASR systems are combined to estimate the most likely hypothesis amongconflicting word pairs or differing hypotheses for the same part of utterance.The contribution of this thesis is twofold. First, we study and analyze theintegrated driven decoding combination method which consists in guidingthe search algorithm of a primary ASR system by the one-best hypothesesof auxiliary systems. Thus we propose some improvements in order to makethe driven decoding more efficient and generalizable. The proposed methodis called BONG and consists in using Bag Of N-Gram auxiliary hypothesisfor the driven decoding.Second, we propose a new framework for low latency paralyzed single-passspeech recognizer harnessing. We study various theoretical harnessingmodels and we present an example of harnessing implementation basedon client/server distributed architecture. Afterwards, we suggest differentcombination methods adapted to the presented harnessing architecture:first we extend the BONG combination method for low latency paralyzedsingle-pass speech recognizer systems collaboration. Then we propose, anadaptation of the ROVER combination method to be performed during thedecoding process using a local vote procedure followed by voting based onword frequencies.LE MANS-BU Sciences (721812109) / SudocSudocFranceF

    LIUM Machine Translation Systems for WMT17 News Translation Task

    Full text link
    This paper describes LIUM submissions to WMT17 News Translation Task for English-German, English-Turkish, English-Czech and English-Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target monolingual corpora for back-translation. The impact of back-translation quantity and quality is also analyzed for English-Turkish where our post-deadline submission surpassed the best entry by +1.6 BLEU.Comment: News Translation Task System Description paper for WMT1

    NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

    Full text link
    In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.Comment: 10 pages, 3 figure

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Full text link
    In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.Comment: EMNLP 201

    Does Multimodality Help Human and Machine for Translation and Image Captioning?

    Full text link
    This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.Comment: 7 pages, 2 figures, v4: Small clarification in section 4 title and conten

    Continuous adaptation to user feedback for statistical machine translation

    Get PDF
    © 2015 The Authors. Published by Association for Computational Linguistics . This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://www.aclweb.org/anthology/N15-1103This paper gives a detailed experiment feedback of different approaches to adapt a statistical machine translation system towards a targeted translation project, using only small amounts of parallel in-domain data. The experiments were performed by professional translators under realistic conditions of work using a computer assisted translation tool. We analyze the influence of these adaptations on the translator productivity and on the overall post-editing effort. We show that significant improvements can be obtained by using the presented adaptation techniques

    LIUM-CVC Submissions for WMT17 Multimodal Translation Task

    Get PDF
    This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.Comment: MMT System Description Paper for WMT1
    corecore