171 research outputs found
Image Pivoting for Learning Multilingual Multimodal Representations
In this paper we propose a model to learn multimodal multilingual
representations for matching images and sentences in different languages, with
the aim of advancing multilingual versions of image search and image
understanding. Our model learns a common representation for images and their
descriptions in two different languages (which need not be parallel) by
considering the image as a pivot between two languages. We introduce a new
pairwise ranking loss function which can handle both symmetric and asymmetric
similarity between the two modalities. We evaluate our models on
image-description ranking for German and English, and on semantic textual
similarity of image descriptions in English. In both cases we achieve
state-of-the-art performance.Comment: 7 pages, EMNLP 201
Lessons learned in multilingual grounded language learning
Recent work has shown how to learn better visual-semantic embeddings by
leveraging image descriptions in more than one language. Here, we investigate
in detail which conditions affect the performance of this type of grounded
language learning model. We show that multilingual training improves over
bilingual training, and that low-resource languages benefit from training with
higher-resource languages. We demonstrate that a multilingual model can be
trained equally well on either translations or comparable sentence pairs, and
that annotating the same set of images in multiple language enables further
improvements via an additional caption-caption ranking objective.Comment: CoNLL 201
Visual Pivoting for (Unsupervised) Entity Alignment
This work studies the use of visual semantic representations to align
entities in heterogeneous knowledge graphs (KGs). Images are natural components
of many existing KGs. By combining visual knowledge with other auxiliary
information, we show that the proposed new approach, EVA, creates a holistic
entity representation that provides strong signals for cross-graph entity
alignment. Besides, previous entity alignment methods require human labelled
seed alignment, restricting availability. EVA provides a completely
unsupervised solution by leveraging the visual similarity of entities to create
an initial seed dictionary (visual pivots). Experiments on benchmark data sets
DBP15k and DWY15k show that EVA offers state-of-the-art performance on both
monolingual and cross-lingual entity alignment tasks. Furthermore, we discover
that images are particularly useful to align long-tail KG entities, which
inherently lack the structural contexts necessary for capturing the
correspondences.Comment: To appear at AAAI-202
A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation
Interlingua based Machine Translation (MT) aims to encode multiple languages
into a common linguistic representation and then decode sentences in multiple
target languages from this representation. In this work we explore this idea in
the context of neural encoder decoder architectures, albeit on a smaller scale
and without MT as the end goal. Specifically, we consider the case of three
languages or modalities X, Z and Y wherein we are interested in generating
sequences in Y starting from information available in X. However, there is no
parallel training data available between X and Y but, training data is
available between X & Z and Z & Y (as is often the case in many real world
applications). Z thus acts as a pivot/bridge. An obvious solution, which is
perhaps less elegant but works very well in practice is to train a two stage
model which first converts from X to Z and then from Z to Y. Instead we explore
an interlingua inspired solution which jointly learns to do the following (i)
encode X and Z to a common representation and (ii) decode Y from this common
representation. We evaluate our model on two tasks: (i) bridge transliteration
and (ii) bridge captioning. We report promising results in both these
applications and believe that this is a right step towards truly interlingua
inspired encoder decoder architectures.Comment: 10 page
- …