Search CORE

966 research outputs found

Characterizing the impact of geometric properties of word embeddings on task performance

Author: Ferhatosmanoglu Hakan
Fosler-Lussier Eric
Haldar Aparajita
Newman-Griffis Denis
Whitaker Brendan
Publication venue
Publication date: 01/01/2019
Field of study

Analysis of word embedding properties to inform their use in downstream NLP tasks has largely been studied by assessing nearest neighbors. However, geometric properties of the continuous feature space contribute directly to the use of embedding features in downstream models, and are largely unexplored. We consider four properties of word embedding geometry, namely: position relative to the origin, distribution of features in the vector space, global pairwise distances, and local pairwise distances. We define a sequence of transformations to generate new embeddings that expose subsets of these properties to downstream models and evaluate change in task performance to understand the contribution of each property to NLP models. We transform publicly available pretrained embeddings from three popular toolkits (word2vec, GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model linguistic information in the vector space, and extrinsic tasks, which use vectors as input to machine learning models. We find that intrinsic evaluations are highly sensitive to absolute position, while extrinsic tasks rely primarily on local similarity. Our findings suggest that future embedding models and post-processing techniques should focus primarily on similarity to nearby points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space Representations for NLP (RepEval 2019). 7 pages + reference

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Warwick Research Archives Portal Repository

White Rose Research Online

Characterizing the impact of geometric properties of word embeddings on task performance

Author: Ferhatosmanoglu H.
Fosler-Lussier E.
Haldar A.
Newman-Griffis D.
Whitaker B.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2019
Field of study

White Rose Research Online

A visual embedding for the unsupervised extraction of abstract semantics

Author: Ayguadé Parra Eduard
Béjar Alonso Javier
Chen R
Cortés García Claudio Ulises
García Gasulla Dario
Labarta Mancho Jesús José
Suzumura Toyotaro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Vector-space word representations obtained from neural network models have been shown to enable semantic operations based on vector arithmetic. In this paper, we explore the existence of similar information on vector representations of images. For that purpose we define a methodology to obtain large, sparse vector representations of image classes, and generate vectors through the state-of-the-art deep learning architecture GoogLeNet for 20 K images obtained from ImageNet. We first evaluate the resultant vector-space semantics through its correlation with WordNet distances, and find vector distances to be strongly correlated with linguistic semantics. We then explore the location of images within the vector space, finding elements close in WordNet to be clustered together, regardless of significant visual variances (e.g., 118 dog types). More surprisingly, we find that the space unsupervisedly separates complex classes without prior knowledge (e.g., living things). Afterwards, we consider vector arithmetics. Although we are unable to obtain meaningful results on this regard, we discuss the various problem we encountered, and how we consider to solve them. Finally, we discuss the impact of our research for cognitive systems, focusing on the role of the architecture being used.This work is partially supported by the Joint Study Agreement no. W156463 under the IBM/BSC Deep Learning Center agreement, by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051), and by the Core Research for Evolutional Science and Technology (CREST) program of Japan Science and Technology Agency (JST).Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC