1,145 research outputs found
From Bilingual to Multilingual Neural Machine Translation by Incremental Training
Multilingual Neural Machine Translation approaches are based on the use of
task-specific models and the addition of one more language can only be done by
retraining the whole system. In this work, we propose a new training schedule
that allows the system to scale to more languages without modification of the
previous components based on joint training and language-independent
encoder/decoder modules allowing for zero-shot translation. This work in
progress shows close results to the state-of-the-art in the WMT task.Comment: Accepted paper at ACL 2019 Student Research Workshop. arXiv admin
note: substantial text overlap with arXiv:1905.0683
From bilingual to multilingual neural machine translation by incremental training
Multilingual Neural Machine Translation approaches are based on the use of task specific models and the addition of one more language can only be done by retraining the whole system. In this work, we propose a new training schedule that allows the system to scale to more languages without modification of the previous components based on joint training and language-independent encoder/decoder modules allowing for zero-shot translation. This work in progress shows close results to state-of-the-art in the WMT task.This work is supported in part by a Google Faculty Research Award. This work is also supported in part by the Spanish Ministerio de Economa y Competitividad, the European Regional Development Fund and the Agencia Estatal de Investigacin, through the postdoctoral senior grant Ramon y Cajal, contract TEC2015-69266-P (MINECO/FEDER,EU) and contract PCIN-2017- 079 (AEI/MINECO).Peer ReviewedPostprint (published version
Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation
This thesis aims to study different language-specific approaches for Multilingual Machine Translation without parameter sharing and their properties compared to the current state-of-the-art based on parameter-sharing. We define Multilingual Machine Translation as the task that focuses on methods to translate between several pairs of languages in a single system. It has been widely studied in recent years due to its ability to easily scale to more languages, even between pairs never seen together during training (zero-shot translation). Several architectures have been proposed to tackle this problem with varying amounts of shared parameters between languages. Current state-of-the-art systems focus on a single sequence-to-sequence architecture where all languages share the complete set of parameters, including the token representation. While this has proven convenient for transfer learning, it makes it challenging to incorporate new languages into the trained model as all languages depend on the same parameters.
What all proposed architectures have in common is enforcing a shared presentation space between languages. Specifically, during this work, we will employ as representation the final output of the encoders that the decoders will use to perform cross-attention. Having a shared space reduces noise as similar sentences at semantic level produce similar vectorial representations, helping the decoders process representations from several languages. This semantic representation is particularly important for zero-shot translation as the representation similarity to the languages pairs seen during training is key to reducing ambiguity between languages and obtaining good translation performance.
This thesis is structured in three main blocks, focused on different scenarios of this task. Firstly, we propose a training method that enforces a common representation for bilingual training and a procedure to extend it to new languages efficiently. Secondly, we propose another training method that allows this representation to be learned directly on multilingual data and can be equally extended to new languages. Thirdly, we show that the proposed multilingual architecture is not limited only to textual languages. We extend our method to new data modalities by adding speech encoders, performing Spoken Language Translation, including Zero-Shot, to all the supported languages.
Our main results show that the common intermediate representation is achievable in this scenario, matching the performance of previously shared systems while allowing the addition of new languages or data modalities efficiently without negative transfer learning to the previous languages or retraining the system.El objetivo de esta tesis es estudiar diferentes arquitecturas de Traducción Automática Multilingüe con parámetros especÃficos para cada idioma que no son compartidos, en contraposición al estado del arte actual basado en compartir parámetros. Podemos definir la Traducción Automática Multilingüe como la tarea que estudia métodos para traducir entre varios pares de idiomas en un único sistema. Ésta ha sido ampliamente estudiada en los últimos años debido a que nos permite escalar nuestros sistemas con facilidad a un gran número de idiomas, incluso entre pares de idiomas que no han sido nunca entrenados juntos (traducción zero-shot). Diversas arquitecturas han sido propuestas con diferentes niveles de parámetros compartidos entre idiomas, El estado del arte actual se enfoca hacÃa un solo modelo secuencia a secuencia donde todos los parámetros son compartidos por todos los idiomas, incluyendo la representación a nivel de unidad lingüÃstica. Siendo esto beneficioso para la transferencia de conocimiento entre idiomas, también puede resultar una limitación a la hora de añadir nuevos, ya que modificarÃamos los parámetros para todos los idiomas soportados. El elemento común de todas las arquitecturas propuestas es promover un espacio común donde representar a todos los idiomas en el sistema. Concretamente, durante este trabajo, nos referiremos a la representación final de los codificadores del sistema como este espacio, puesto que es la representación utilizada durante la atención cruzada por los decodificadores al generar traducciones. El objetivo de esta representación común es reducir ruido, ya que frases similares producirán representaciones similares, lo cual resulta de ayuda al usar un mismo decodificador para procesar la representación vectorial de varios idiomas. Esto es especialmente importante en el caso de la traducción zero-shot, ya que el par de idiomas no ha sido nunca entrenado conjuntamente, para reducir posibles ambigüedades y obtener una buena calidad de traducción. La tesis está organizada en tres bloques principales, enfocados en diferentes escenarios de esta tarea. Primero, proponemos un método para entrenar una representación común en sistemas bilingües, y un procedimiento para extenderla a nuevos idiomas de manera eficiente. Segundo, proponemos otro método de entrenamiento para aprender esta representación directamente desde datos multilingües y como puede ser igualmente extendida a nuevos idiomas. Tercero, mostramos que esta representación no está limitada únicamente a datos textuales. Para ello, extendemos nuestro método a otra modalidad de datos, en este caso discurso hablado, demostrando que podemos realizar traducción de audio a texto para todos los idiomas soportados, incluyendo traducción zero-shot. Nuestros resultados muestras que una representación común puede ser aprendida sin compartir parámetros entre idiomas, con una calidad de traducción similar a la del actual estado del arte, con la ventaja de permitirnos añadir nuevos idiomas o modalidades de datos de manera eficiente, sin transferencia negativa de conocimiento a los idiomas ya soportados y sin necesidad de reentrenarlos.Postprint (published version
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Multilingual machine translation: Deep analysis of language-specific encoder-decoders
State-of-the-art multilingual machine translation relies on a shared encoder-decoder. In this paper, we propose an alternative approach based on language-specific encoder-decoders, which can be easily extended to new languages by learning their corresponding modules. To establish a common interlingua representation, we simultaneously train N initial languages. Our experiments show that the proposed approach improves over the shared encoder-decoder for the initial languages and when adding new languages, without the need to retrain the remaining modules. All in all, our work closes the gap between shared and language-specific encoder-decoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings.This work is supported by the European Research Council (ERC) under the European Union’sHorizon 2020 research and innovation programme (grant agreement No. 947657).Peer ReviewedPostprint (published version
Exploiting large pre-trained models for low-resource neural machine translation
Pre-trained models have revolutionized the natural language processing field by leveraging large-scale language representations for various tasks. Some pre-trained models offer general-purpose representations, while others are specialized in particular tasks, like neural machine translation (NMT). Multilingual NMT-targeted systems are often fine-tuned for specific language pairs, but there is a lack of evidence-based best-practice recommendations to guide this process. Additionally, deploying these large pre-trained models in computationally restricted environments, typically found in developing regions where low-resource languages are spoken, has become challenging. We propose a pipeline to tune the mBART50 pre-trained model to 8 diverse low-resource language pairs, and then distill the resulting system to obtain lightweight and more sustainable NMT models. Our pipeline conveniently exploits back-translation, synthetic corpus filtering, and knowledge distillation to deliver efficient bilingual translation models that are 13 times smaller, while maintaining a close BLEU performance.This paper is part of the R+D+i project PID2021-127999NB-I00 funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033) and the European Regional Development Fund A way to make Europe. The computational resources used were funded by the European Regional Development Fund through project IDIFEDER/2020/00
- …