Search CORE

512 research outputs found

Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

Author: Korhonen Anna
Mrkšić Nikola
Vulić Ivan
Publication venue
Publication date: 01/01/2017
Field of study

Existing approaches to automatic VerbNet-style verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for inducing VerbNets for multiple languages. To the best of our knowledge, this is the first study which demonstrates how the architectures for learning word embeddings can be applied to this challenging syntactic-semantic task. Our method uses cross-lingual translation pairs to tie each of the six target languages into a bilingual vector space with English, jointly specialising the representations to encode the relational information from English VerbNet. A standard clustering algorithm is then run on top of the VerbNet-specialised representations, using vector dimensions as features for learning verb classes. Our results show that the proposed cross-lingual transfer approach sets new state-of-the-art verb classification performance across all six target languages explored in this work.Comment: EMNLP 2017 (long paper

arXiv.org e-Print Archive

Crossref

Synonyms and Antonyms: Embedded Conflict

Author: Samenko Igor
Tikhonov Alexey
Yamshchikov Ivan P.
Publication venue
Publication date: 27/04/2020
Field of study

Since modern word embeddings are motivated by a distributional hypothesis and are, therefore, based on local co-occurrences of words, it is only to be expected that synonyms and antonyms can have very similar embeddings. Contrary to this widespread assumption, this paper shows that modern embeddings contain information that distinguishes synonyms and antonyms despite small cosine similarities between corresponding vectors. This information is encoded in the geometry of the embeddings and could be extracted with a manifold learning procedure or {\em contrasting map}. Such a map is trained on a small labeled subset of the data and can produce new empeddings that explicitly highlight specific semantic attributes of the word. The new embeddings produced by the map are shown to improve the performance on downstream tasks

arXiv.org e-Print Archive

Neo: A Learned Query Optimizer

Author: Alizadeh Mohammad
Kraska Tim
Mao Hongzi
Marcus Ryan
Negi Parimarjan
Papaemmanouil Olga
Tatbul Nesime
Zhang Chi
Publication venue: 'VLDB Endowment'
Publication date: 07/04/2019
Field of study

Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them

arXiv.org e-Print Archive

DSpace@MIT

Explicit retrofitting of distributional word vectors

Author: Glavaš G
Vulić I
Publication venue: ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Publication date: 01/01/2018
Field of study

Semantic specialization of distributional word vectors, referred to as retrofitting, is a process of fine-tuning word vectors using external lexical knowledge in order to better embed some semantic relation. Existing retrofitting models integrate linguistic constraints directly into learning objectives and, consequently, specialize only the vectors of words from the constraints. In this work, in contrast, we transform external lexico-semantic relations into training examples which we use to learn an explicit retrofitting model (ER). The ER model allows us to learn a global specialization function and specialize the vectors of words unobserved in the training data as well. We report large gains over original distributional vector spaces in (1) intrinsic word similarity evaluation and on (2) two downstream tasks -- lexical simplification and dialog state tracking. Finally, we also successfully specialize vector spaces of new languages (i.e., unseen in the training data) by coupling ER with shared multilingual distributional vector spaces

Crossref

MAnnheim DOCument Server

Apollo (Cambridge)

Recommended from our members

Cross-lingual semantic specialization via lexical relation induction

Author: Glavaš G
Korhonen Anna-Leena
Ponti Edoardo
Reichart R
Vulić I
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 01/01/2020
Field of study

Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages

Apollo (Cambridge)

Cross-lingual semantic specialization via lexical relation induction

Author: Glavaš G
Korhonen A
Ponti EM
Reichart R
Vulić I
Publication venue: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
Publication date: 01/01/2019
Field of study

Crossref

MAnnheim DOCument Server

Edinburgh Research Explorer

Apollo (Cambridge)