321 research outputs found
ELITR Non-Native Speech Translation at IWSLT 2020
This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-toend general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set
Multilingual Speech Translation KIT @ IWSLT2021
This paper contains the description for the submission of Karlsruhe Institute of Technology (KIT) for the multilingual TEDx translation task in the IWSLT 2021 evaluation campaign. Our main approach is to develop both cascade and end-to-end systems and eventually combine them together to achieve the best possible results for this extremely low-resource setting. The report also confirms certain consistent architectural improvement added to the Transformer architecture, for all tasks: translation, transcription and speech translation
Reward Gaming in Conditional Text Generation
To align conditional text generation model outputs with desired behaviors,
there has been an increasing focus on training the model using reinforcement
learning (RL) with reward functions learned from human annotations. Under this
framework, we identify three common cases where high rewards are incorrectly
assigned to undesirable patterns: noise-induced spurious correlation, naturally
occurring spurious correlation, and covariate shift. We show that even though
learned metrics achieve high performance on the distribution of the data used
to train the reward function, the undesirable patterns may be amplified during
RL training of the text generation model. While there has been discussion about
reward gaming in the RL or safety community, in this discussion piece, we would
like to highlight reward gaming in the natural language generation (NLG)
community using concrete conditional text generation examples and discuss
potential fixes and areas for future work
実応用を志向した機械翻訳システムの設計と評価
Tohoku University博士(情報科学)thesi
An Open Dataset and Model for Language Identification
Language identification (LID) is a fundamental step in many natural language
processing pipelines. However, current LID systems are far from perfect,
particularly on lower-resource languages. We present a LID model which achieves
a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201
languages, outperforming previous work. We achieve this by training on a
curated dataset of monolingual data, the reliability of which we ensure by
auditing a sample from each source and each language manually. We make both the
model and the dataset available to the research community. Finally, we carry
out detailed analysis into our model's performance, both in comparison to
existing open models and by language class.Comment: To be published in ACL 202
LENS: A Learnable Evaluation Metric for Text Simplification
Training learnable metrics using modern language models has recently emerged
as a promising method for the automatic evaluation of machine translation.
However, existing human evaluation datasets for text simplification have
limited annotations that are based on unitary or outdated models, making them
unsuitable for this approach. To address these issues, we introduce the
SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on
2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging
simplification benchmark consisting of over 1K human ratings of 360
simplifications including GPT-3.5 generated text. Training on SimpEval, we
present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive
empirical results show that LENS correlates much better with human judgment
than existing metrics, paving the way for future progress in the evaluation of
text simplification. We also introduce Rank and Rate, a human evaluation
framework that rates simplifications from several models in a list-wise manner
using an interactive interface, which ensures both consistency and accuracy in
the evaluation process and is used to create the SimpEval datasets.Comment: Accepted at ACL 202
Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages
Translation Quality Estimation (QE) is the task of predicting the quality of
machine translation (MT) output without any reference. This task has gained
increasing attention as an important component in the practical applications of
MT. In this paper, we first propose XLMRScore, which is a cross-lingual
counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model. This metric
can be used as a simple unsupervised QE method, while employing it results in
two issues: firstly, the untranslated tokens leading to unexpectedly high
translation scores, and secondly, the issue of mismatching errors between
source and hypothesis tokens when applying the greedy matching in XLMRScore. To
mitigate these issues, we suggest replacing untranslated words with the unknown
token and the cross-lingual alignment of the pre-trained model to represent
aligned words closer to each other, respectively. We evaluate the proposed
method on four low-resource language pairs of WMT21 QE shared task, as well as
a new English-Farsi test dataset introduced in this paper. Experiments show
that our method could get comparable results with the supervised baseline for
two zero-shot scenarios, i.e., with less than 0.01 difference in Pearson
correlation, while outperforming unsupervised rivals in all the low-resource
language pairs for above 8%, on average.Comment: Submitted to Language Resources and Evaluatio
Exploring Enhanced Code-Switched Noising for Pretraining in Neural Machine Translation
Multilingual pretraining approaches in Neural Machine Translation (NMT) have shown that training models to denoise synthetic code-switched data can yield impressive performance gains --- owing to better multilingual semantic representations and transfer learning. However, they generated the synthetic code-switched data using non-contextual, one-to-one word translations obtained from lexicons - which can lead to significant noise in a variety of cases, including the poor handling of polysemes and multi-word expressions, violation of linguistic agreement and inability to scale to agglutinative languages. To overcome these limitations, we propose an approach called Contextual Code-Switching (CCS), where contextual, many-to-many word translations are generated using a `base' NMT model. We conduct experiments on 3 different language families - Romance, Uralic, and Indo-Aryan - and show significant improvements (by up to 5.5 spBLEU points) over the previous lexicon-based SOTA approaches. We also observe that small CCS models can perform comparably or better than massive models like mBART50 and mRASP2, depending on the size of data provided. We empirically analyse several key factors responsible for these - including context, many-to-many substitutions, code-switching language count etc. - and prove that they all contribute to enhanced pretraining of multilingual NMT models
Transformer Models for Machine Translation and Streaming Automatic Speech Recognition
[ES] El procesamiento del lenguaje natural (NLP) es un conjunto de problemas
computacionales con aplicaciones de máxima relevancia, que junto con otras
tecnologías informáticas se ha beneficiado de la revolución que ha significado
el aprendizaje profundo. Esta tesis se centra en dos problemas fundamentales
para el NLP: la traducción automática (MT) y el reconocimiento automático
del habla o transcripción automática (ASR); así como en una arquitectura
neuronal profunda, el Transformer, que pondremos en práctica para mejorar
las soluciones de MT y ASR en algunas de sus aplicaciones.
El ASR y MT pueden servir para obtener textos multilingües de alta calidad a
un coste razonable para una diversidad de contenidos audiovisuales. Concre-
tamente, esta tesis aborda problemas como el de traducción de noticias o el de
subtitulación automática de televisión. El ASR y MT también se pueden com-
binar entre sí, generando automáticamente subtítulos traducidos, o con otras
soluciones de NLP: resumen de textos para producir resúmenes de discursos, o
síntesis del habla para crear doblajes automáticos. Estas aplicaciones quedan
fuera del alcance de esta tesis pero pueden aprovechar las contribuciones que
contiene, en la meduda que ayudan a mejorar el rendimiento de los sistemas
automáticos de los que dependen.
Esta tesis contiene una aplicación de la arquitectura Transformer al MT tal y
como fue concebida, mediante la que obtenemos resultados de primer nivel en
traducción de lenguas semejantes. En capítulos subsecuentes, esta tesis aborda
la adaptación del Transformer como modelo de lenguaje para sistemas híbri-
dos de ASR en vivo. Posteriormente, describe la aplicación de este tipus de
sistemas al caso de uso de subtitulación de televisión, participando en una com-
petición pública de RTVE donde obtenemos la primera posición con un marge
importante. También demostramos que la mejora se debe principalmenta a la
tecnología desarrollada y no tanto a la parte de los datos.[CA] El processament del llenguage natural (NLP) és un conjunt de problemes com-
putacionals amb aplicacions de màxima rellevància, que juntament amb al-
tres tecnologies informàtiques s'ha beneficiat de la revolució que ha significat
l'impacte de l'aprenentatge profund. Aquesta tesi se centra en dos problemes
fonamentals per al NLP: la traducció automàtica (MT) i el reconeixement
automàtic de la parla o transcripció automàtica (ASR); així com en una ar-
quitectura neuronal profunda, el Transformer, que posarem en pràctica per a
millorar les solucions de MT i ASR en algunes de les seues aplicacions.
l'ASR i MT poden servir per obtindre textos multilingües d'alta qualitat a un
cost raonable per a un gran ventall de continguts audiovisuals. Concretament,
aquesta tesi aborda problemes com el de traducció de notícies o el de subtitu-
lació automàtica de televisió. l'ASR i MT també es poden combinar entre ells,
generant automàticament subtítols traduïts, o amb altres solucions de NLP:
amb resum de textos per produir resums de discursos, o amb síntesi de la parla
per crear doblatges automàtics. Aquestes altres aplicacions es troben fora de
l'abast d'aquesta tesi però poden aprofitar les contribucions que conté, en la
mesura que ajuden a millorar els resultats dels sistemes automàtics dels quals
depenen.
Aquesta tesi conté una aplicació de l'arquitectura Transformer al MT tal com
va ser concebuda, mitjançant la qual obtenim resultats de primer nivell en
traducció de llengües semblants. En capítols subseqüents, aquesta tesi aborda
l'adaptació del Transformer com a model de llenguatge per a sistemes híbrids
d'ASR en viu. Posteriorment, descriu l'aplicació d'aquest tipus de sistemes al
cas d'ús de subtitulació de continguts televisius, participant en una competició
pública de RTVE on obtenim la primera posició amb un marge significant.
També demostrem que la millora es deu principalment a la tecnologia desen-
volupada i no tant a la part de les dades[EN] Natural language processing (NLP) is a set of fundamental computing prob-
lems with immense applicability, as language is the natural communication
vehicle for people. NLP, along with many other computer technologies, has
been revolutionized in recent years by the impact of deep learning. This thesis
is centered around two keystone problems for NLP: machine translation (MT)
and automatic speech recognition (ASR); and a common deep neural architec-
ture, the Transformer, that is leveraged to improve the technical solutions for
some MT and ASR applications.
ASR and MT can be utilized to produce cost-effective, high-quality multilin-
gual texts for a wide array of media. Particular applications pursued in this
thesis are that of news translation or that of automatic live captioning of tele-
vision broadcasts. ASR and MT can also be combined with each other, for
instance generating automatic translated subtitles from audio, or augmented
with other NLP solutions: text summarization to produce a summary of a
speech, or speech synthesis to create an automatic translated dubbing, for in-
stance. These other applications fall out of the scope of this thesis, but can
profit from the contributions that it contains, as they help to improve the
performance of the automatic systems on which they depend.
This thesis contains an application of the Transformer architecture to MT as it
was originally conceived, achieving state-of-the-art results in similar language
translation. In successive chapters, this thesis covers the adaptation of the
Transformer as a language model for streaming hybrid ASR systems. After-
wards, it describes how we applied the developed technology for a specific use
case in television captioning by participating in a competitive challenge and
achieving the first position by a large margin. We also show that the gains
came mostly from the improvement in technology capabilities over two years
including that of the Transformer language model adapted for streaming, and
the data component was minor.Baquero Arnal, P. (2023). Transformer Models for Machine Translation and Streaming Automatic Speech Recognition [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19368
- …