151 research outputs found

    A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation

    Full text link
    Interlingua based Machine Translation (MT) aims to encode multiple languages into a common linguistic representation and then decode sentences in multiple target languages from this representation. In this work we explore this idea in the context of neural encoder decoder architectures, albeit on a smaller scale and without MT as the end goal. Specifically, we consider the case of three languages or modalities X, Z and Y wherein we are interested in generating sequences in Y starting from information available in X. However, there is no parallel training data available between X and Y but, training data is available between X & Z and Z & Y (as is often the case in many real world applications). Z thus acts as a pivot/bridge. An obvious solution, which is perhaps less elegant but works very well in practice is to train a two stage model which first converts from X to Z and then from Z to Y. Instead we explore an interlingua inspired solution which jointly learns to do the following (i) encode X and Z to a common representation and (ii) decode Y from this common representation. We evaluate our model on two tasks: (i) bridge transliteration and (ii) bridge captioning. We report promising results in both these applications and believe that this is a right step towards truly interlingua inspired encoder decoder architectures.Comment: 10 page

    Lessons learned in multilingual grounded language learning

    Full text link
    Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate in detail which conditions affect the performance of this type of grounded language learning model. We show that multilingual training improves over bilingual training, and that low-resource languages benefit from training with higher-resource languages. We demonstrate that a multilingual model can be trained equally well on either translations or comparable sentence pairs, and that annotating the same set of images in multiple language enables further improvements via an additional caption-caption ranking objective.Comment: CoNLL 201

    Bridging languages through images with deep partial canonical correlation analysis

    Get PDF
    We present a deep neural network that leverages images to improve bilingual text embeddings. Relying on bilingual image tags and descriptions, our approach conditions text embedding induction on the shared visual information for both languages, producing highly correlated bilingual embeddings. In particular, we propose a novel model based on Partial Canonical Correlation Analysis (PCCA). While the original PCCA finds linear projections of two views in order to maximize their canonical correlation conditioned on a shared third variable, we introduce a non-linear Deep PCCA (DPCCA) model, and develop a new stochastic iterative algorithm for its optimization. We evaluate PCCA and DPCCA on multilingual word similarity and cross-lingual image description retrieval. Our models outperform a large variety of previous methods, despite not having access to any visual signal during test time inference. Our code and data are available at: https://github.com/rotmanguy/DPCCA

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Multimodal Grounding for Language Processing

    Get PDF

    Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation

    Get PDF
    This thesis aims to study different language-specific approaches for Multilingual Machine Translation without parameter sharing and their properties compared to the current state-of-the-art based on parameter-sharing. We define Multilingual Machine Translation as the task that focuses on methods to translate between several pairs of languages in a single system. It has been widely studied in recent years due to its ability to easily scale to more languages, even between pairs never seen together during training (zero-shot translation). Several architectures have been proposed to tackle this problem with varying amounts of shared parameters between languages. Current state-of-the-art systems focus on a single sequence-to-sequence architecture where all languages share the complete set of parameters, including the token representation. While this has proven convenient for transfer learning, it makes it challenging to incorporate new languages into the trained model as all languages depend on the same parameters. What all proposed architectures have in common is enforcing a shared presentation space between languages. Specifically, during this work, we will employ as representation the final output of the encoders that the decoders will use to perform cross-attention. Having a shared space reduces noise as similar sentences at semantic level produce similar vectorial representations, helping the decoders process representations from several languages. This semantic representation is particularly important for zero-shot translation as the representation similarity to the languages pairs seen during training is key to reducing ambiguity between languages and obtaining good translation performance. This thesis is structured in three main blocks, focused on different scenarios of this task. Firstly, we propose a training method that enforces a common representation for bilingual training and a procedure to extend it to new languages efficiently. Secondly, we propose another training method that allows this representation to be learned directly on multilingual data and can be equally extended to new languages. Thirdly, we show that the proposed multilingual architecture is not limited only to textual languages. We extend our method to new data modalities by adding speech encoders, performing Spoken Language Translation, including Zero-Shot, to all the supported languages. Our main results show that the common intermediate representation is achievable in this scenario, matching the performance of previously shared systems while allowing the addition of new languages or data modalities efficiently without negative transfer learning to the previous languages or retraining the system.El objetivo de esta tesis es estudiar diferentes arquitecturas de Traducción Automática Multilingüe con parámetros específicos para cada idioma que no son compartidos, en contraposición al estado del arte actual basado en compartir parámetros. Podemos definir la Traducción Automática Multilingüe como la tarea que estudia métodos para traducir entre varios pares de idiomas en un único sistema. Ésta ha sido ampliamente estudiada en los últimos años debido a que nos permite escalar nuestros sistemas con facilidad a un gran número de idiomas, incluso entre pares de idiomas que no han sido nunca entrenados juntos (traducción zero-shot). Diversas arquitecturas han sido propuestas con diferentes niveles de parámetros compartidos entre idiomas, El estado del arte actual se enfoca hacía un solo modelo secuencia a secuencia donde todos los parámetros son compartidos por todos los idiomas, incluyendo la representación a nivel de unidad lingüística. Siendo esto beneficioso para la transferencia de conocimiento entre idiomas, también puede resultar una limitación a la hora de añadir nuevos, ya que modificaríamos los parámetros para todos los idiomas soportados. El elemento común de todas las arquitecturas propuestas es promover un espacio común donde representar a todos los idiomas en el sistema. Concretamente, durante este trabajo, nos referiremos a la representación final de los codificadores del sistema como este espacio, puesto que es la representación utilizada durante la atención cruzada por los decodificadores al generar traducciones. El objetivo de esta representación común es reducir ruido, ya que frases similares producirán representaciones similares, lo cual resulta de ayuda al usar un mismo decodificador para procesar la representación vectorial de varios idiomas. Esto es especialmente importante en el caso de la traducción zero-shot, ya que el par de idiomas no ha sido nunca entrenado conjuntamente, para reducir posibles ambigüedades y obtener una buena calidad de traducción. La tesis está organizada en tres bloques principales, enfocados en diferentes escenarios de esta tarea. Primero, proponemos un método para entrenar una representación común en sistemas bilingües, y un procedimiento para extenderla a nuevos idiomas de manera eficiente. Segundo, proponemos otro método de entrenamiento para aprender esta representación directamente desde datos multilingües y como puede ser igualmente extendida a nuevos idiomas. Tercero, mostramos que esta representación no está limitada únicamente a datos textuales. Para ello, extendemos nuestro método a otra modalidad de datos, en este caso discurso hablado, demostrando que podemos realizar traducción de audio a texto para todos los idiomas soportados, incluyendo traducción zero-shot. Nuestros resultados muestras que una representación común puede ser aprendida sin compartir parámetros entre idiomas, con una calidad de traducción similar a la del actual estado del arte, con la ventaja de permitirnos añadir nuevos idiomas o modalidades de datos de manera eficiente, sin transferencia negativa de conocimiento a los idiomas ya soportados y sin necesidad de reentrenarlos.Postprint (published version
    corecore