Search CORE

2,272 research outputs found

A critique of word similarity as a method for evaluating distributional semantic models

Author: Batchkarov Miroslav
Kober Thomas
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

This paper aims to re-think the role of the word similarity task in distributional semantics research. We argue while it is a valuable tool, it should be used with care because it provides only an approximate measure of the quality of a distributional model. Word similarity evaluations assume there exists a single notion of similarity that is independent of a particular application. Further, the small size and low inter-annotator agreement of existing data sets makes it challenging to find significant differences between models

Crossref

Edinburgh Research Explorer

Sussex Research Online

Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

Author: Derby Steven
Devereux Barry
Miller Paul
Murphy Brian
Publication venue
Publication date: 01/01/2018
Field of study

Distributional models provide a convenient way to model semantics using dense embedding spaces derived from unsupervised learning algorithms. However, the dimensions of dense embedding spaces are not designed to resemble human semantic knowledge. Moreover, embeddings are often built from a single source of information (typically text data), even though neurocognitive research suggests that semantics is deeply linked to both language and perception. In this paper, we combine multimodal information from both text and image-based representations derived from state-of-the-art distributional models to produce sparse, interpretable vectors using Joint Non-Negative Sparse Embedding. Through in-depth analyses comparing these sparse models to human-derived behavioural and neuroimaging data, we demonstrate their ability to predict interpretable linguistic descriptions of human ground-truth semantic knowledge.Comment: Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 260-270. Brussels, Belgium, October 31 - November 1, 2018. Association for Computational Linguistic

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

The Interplay of Semantics and Morphology in Word Embeddings

Author: Avraham Oded
Goldberg Yoav
Publication venue
Publication date: 01/01/2017
Field of study

We explore the ability of word embeddings to capture both semantic and morphological similarity, as affected by the different types of linguistic properties (surface form, lemma, morphological tag) used to compose the representation of each word. We train several models, where each uses a different subset of these properties to compose its representations. By evaluating the models on semantic and morphological measures, we reveal some useful insights on the relationship between semantics and morphology

arXiv.org e-Print Archive

Crossref

Using the Outlier Detection Task to Evaluate Distributional Semantic Models

Author: Gamallo Otero Pablo
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as a text source to train the models, we observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words. However, there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies. In general, syntax-based models tend to perform better than those based on bag-of-words for this specific task. Similar experiments were carried out for Portuguese with similar results. The test datasets we have created for the outlier detection task in English and Portuguese are freely availableThis work was supported by a 2016 BBVA Foundation Grant for Researchers and Cultural Creators and by Project TELEPARES, Ministry of Economy and Competitiveness (FFI2014-51978-C2-1-R). It has received financial support from the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016–2019, ED431G/08) and the European Regional Development Fund (ERDF)S

Repositorio Institucional da Universidade de Santiago de Compostela