Search CORE

167 research outputs found

Bilingual learning of multi-sense embeddings with discrete autoencoders

Author: Titov I.
van Noord G.
Šuster S.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International Migration, Integration and Social Cohesion online publications

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

Author: Knight K.
Nenkova A.
Rambow O.
Titov I.
van Noord G.
Šuster S.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information. Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen sense. The two components are estimated jointly. We observe that the word representations induced from bilingual data outperform the monolingual counterparts across a range of evaluation tasks, even though crosslingual information is not available at test time

University of Groningen

Edinburgh Research Explorer

UvA-DARE

International Migration, Integration and Social Cohesion online publications

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

Institutional Repository Universiteit Antwerpen

Dissertations of the University of Groningen

Learning to Embed Words in Context for Syntactic Tasks

Author: Gimpel Kevin
Livescu Karen
Tu Lifu
Publication venue
Publication date: 01/01/2017
Field of study

We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large amount of unannotated text and evaluate them as features for part-of-speech taggers and dependency parsers trained on much smaller amounts of annotated data. We find that predictors endowed with token embeddings consistently outperform baseline predictors across a range of context window and training set sizes.Comment: Accepted by ACL 2017 Repl4NLP worksho

arXiv.org e-Print Archive

Crossref

Language classification from bilingual word embedding graphs

Author: Eger Steffen
Hoenen Armin
Mehler Alexander
Publication venue
Publication date: 10/10/2016
Field of study

We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the target language. Additionally, we show how bilingual word embeddings can be employed for the task of semantic language classification and that joint semantic spaces vary in meaningful ways across second languages. Our results support the hypothesis that semantic language similarity is influenced by both structural similarity as well as geography/contact.Comment: To be published at Coling 201

arXiv.org e-Print Archive

TUbiblio

Empirical studies on word representations

Author: Suster Simon
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2016
Field of study

One of the most fundamental tasks in natural language processing is representing words with mathematical objects (such as vectors). The word representations, which are most often estimated from data, allow capturing the meaning of words. They enable comparing words according to their semantic similarity, and have been shown to work extremely well when included in complex real-world applications. A large part of our work deals with ways of estimating word representations directly from large quantities of text. Our methods exploit the idea that words which occur in similar contexts have a similar meaning. How we define the context is an important focus of our thesis. The context can consist of a number of words to the left and to the right of the word in question, but, as we show, obtaining context words via syntactic links (such as the link between the verb and its subject) often works better. We furthermore investigate word representations that accurately capture multiple meanings of a single word. We show that translation of a word in context contains information that can be used to disambiguate the meaning of that word

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Institutional Repository Universiteit Antwerpen

Dissertations of the University of Groningen