80 research outputs found

    Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

    Full text link
    Recently there has been a lot of interest in learning common representations for multiple views of data. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say, V1V_1 and V2V_2) but parallel data is available between each of these views and a pivot view (V3V_3). We propose a model for learning a common representation for V1V_1, V2V_2 and V3V_3 using only the parallel data available between V1V3V_1V_3 and V2V3V_2V_3. The proposed model is generic and even works when there are nn views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) transfer learning between languages L1L_1,L2L_2,...,LnL_n using a pivot language LL and (ii) cross modal access between images and a language L1L_1 using a pivot language L2L_2. Our model achieves state-of-the-art performance in multilingual document classification on the publicly available multilingual TED corpus and promising results in multilingual multimodal retrieval on a new dataset created and released as a part of this work.Comment: Published at NAACL-HLT 201
    corecore