14 research outputs found

    ELITR Non-Native Speech Translation at IWSLT 2020

    Get PDF
    This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-toend general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set

    Punctuation Prediction for Norwegian: Using Established Approaches for Under-Resourced Languages

    Get PDF
    Masteroppgåve i informasjonsvitskapINFO390MASV-INF

    Improving Zero-shot Translation with Language-Independent Constraints

    Get PDF

    TOWARDS ROBUST END-TO-END SPEECH TRANSLATION

    Get PDF
    Noisy inputs in Speech Recognition causes performance to drop, the same is happening for the more complex case of Speech Translation. We want to explore speech enhancement techniques in a multi-task setting for end-to-end speech translationInterest in speech-to-text translation systems has experienced a remarkable growth in recent years. The main motivation for this is the need to adapt to users the digital content they consume, for example, on social networks or video streaming platforms. In addition, nowadays we have high-quality automatic speech recognition and text translation systems which makes it the perfect time to investigate on speech translation systems. Traditionally cascade systems (ASR + MT) have worked best but great advances have recently been made in End-to-End systems which show their potential. This work is a study of the robustness of both systems, with the aim of being able to establish which approach is more resistant to noise. A series of experiments have been performed to determine which system is more robust. Both cascade and End-to-End systems have been trained with different noise levels using data from MuST-C En-Es, which contains 504 hours of speech, to study the difference in their performances. End-to-End systems have achieved a higher performance systematically. Despite of that, the behaviour of Cascade systems is pretty similar although they don?t achieve the same performance. Moreover, training with noise provides a lot of stability and robustnessEl interés por los sistemas de traducción de habla a texto ha experimentado un crecimiento notable en los últimos años. La principal motivación que ha comportado este crecimiento es la necesidad de adaptar al usuario el contenido digital que consume, por ejemplo, en las redes sociales o plataformas de vídeo streaming. Además, hoy en día tenemos sistemas automáticos de reconocimiento de habla y de traducción de texto de gran calidad lo que hace que sea el momento idóneo para investigar sistemas de traducción de habla. Tradicionalmente los sistemas en cascada (ASR + MT) son los que han funcionado mejor pero recientemente se han producido grandes avances en los sistemas End-to-End. Este trabajo es un estudio de la robustez de ambos sistemas, con el objetivo de poder establecer qué estrategia es más resistente a la presencia de ruido. Se han realizado una serie de experimentos entrenando sistemas en cascada y End-to-End con diferentes niveles de ruido utilizando los datos de MuST-C En-Es, que contiene 504 horas de habla, para determinar qué sistema es más robusto. Los sistemas End-to-End consiguen un rendimiento más elevado y funcionan mejor. Sin embargo, el comportamiento delante señales ruidosas es muy parecido al de los sistemas en Cascada, aunque estos tienen un rendimiento pero. Añadir que entrenar con datos ruidosos aporta mucha estabilidad y robustez a cualquiera de los dos sistemasL'interès pels sistemes de traducció de parla a text ha experimentat un creixement notable els darrers anys. La principal motivació que ha comportat aquest creixement és la necessitat d'adaptar a l'usuari el contingut digital que consumeix, per exemple, a les xarxes socials o a plataformes de vídeo streaming. A més, avui en dia tenim sistemes automàtics de reconeixement de parla i de traducció de text de gran qualitat la qual cosa fa que sigui el moment idoni per investigar sistemes de traducció de parla. Tradicionalment els sistemes en cascada (ASR+MT) són els que han funcionat millor però recentment s'han produït grans avenços en els sistemes End-to-End. Aquest treball és un estudi de la robustesa d'ambdós sistemes, amb l'objectiu de poder establir quina estratègia és més resistent a la presència de soroll. S'han realitzat una sèrie experiments entrenant sistemes en cascada i End-to-End, amb diferents nivells de soroll utilitzant les dades de MuST-C En-Es, que conté 504 hores de parla, per determinar quin sistema és més robust. Les conclusions que se?n poden extreure és que els sistemes End-to-End assoleixen un rendiment més elevat. Tot i això, el comportament davant el soroll és comparable als sistemes Cascada. Afegir que entrenar amb dades sorolloses aporta molta estabilitat i robustesa a qualsevol dels dos sistemes

    Findings of the IWSLT 2022 Evaluation Campaign.

    Get PDF
    The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved

    Improving Zero-shot Translation with Language-Independent Constraints

    Full text link
    An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.Comment: 10 pages version accepted in WMT 201
    corecore