Search CORE

491 research outputs found

Multilingual Models for Compositional Distributed Semantics

Author: Blunsom Phil
Hermann Karl Moritz
Publication venue
Publication date: 01/01/2014
Field of study

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or any syntactic information and are successfully applied to a number of diverse languages. We extend our approach to learn semantic representations at the document level, too. We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data.Comment: Proceedings of ACL 2014 (Long papers

arXiv.org e-Print Archive

Crossref

The Zero Resource Speech Challenge 2017

Author: Anguera Xavier
Benjumea Juan
Bernard Mathieu
Besacier Laurent
Cao Xuan Nga
Dunbar Ewan
Dupoux Emmanuel
Karadayi Julien
Publication venue
Publication date: 12/12/2017
Field of study

We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japa

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

Author: Chandar Sarath
Khapra Mitesh M.
Rajendran Janarthanan
Ravindran Balaraman
Publication venue
Publication date: 01/01/2016
Field of study

Recently there has been a lot of interest in learning common representations for multiple views of data. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say,

V_1

and

V_2

) but parallel data is available between each of these views and a pivot view (

V_3

). We propose a model for learning a common representation for

V_1

V_2

and

V_3

using only the parallel data available between

V_1V_3

and

V_2V_3

. The proposed model is generic and even works when there are

n

views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) transfer learning between languages

L_1

L_2

,...,

L_n

using a pivot language

L

and (ii) cross modal access between images and a language

L_1

using a pivot language

L_2

. Our model achieves state-of-the-art performance in multilingual document classification on the publicly available multilingual TED corpus and promising results in multilingual multimodal retrieval on a new dataset created and released as a part of this work.Comment: Published at NAACL-HLT 201

arXiv.org e-Print Archive

PolyPublie

Abstract

Author: Alex Boulanger
Bag-of-words Autoencoder
Hugo Larochelle
Stanislas Lauly
Université De Sherbrooke
Université De Sherbrooke
Publication venue
Publication date
Field of study

Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. The autoencoder is trained to reconstruct the bag-of-word representation of given sentence from an encoded representation extracted from its translation. We evaluate our approach on a multilingual document classification task, where labeled data is available only for one language (e.g. English) while classification must be performed in a different language (e.g. French). In our experiments, we observe that our method compares favorably with a previously proposed method that exploits word-level alignments to learn word representations.

CiteSeerX