1,148 research outputs found
Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge
Distributional models provide a convenient way to model semantics using dense
embedding spaces derived from unsupervised learning algorithms. However, the
dimensions of dense embedding spaces are not designed to resemble human
semantic knowledge. Moreover, embeddings are often built from a single source
of information (typically text data), even though neurocognitive research
suggests that semantics is deeply linked to both language and perception. In
this paper, we combine multimodal information from both text and image-based
representations derived from state-of-the-art distributional models to produce
sparse, interpretable vectors using Joint Non-Negative Sparse Embedding.
Through in-depth analyses comparing these sparse models to human-derived
behavioural and neuroimaging data, we demonstrate their ability to predict
interpretable linguistic descriptions of human ground-truth semantic knowledge.Comment: Proceedings of the 22nd Conference on Computational Natural Language
Learning (CoNLL 2018), pages 260-270. Brussels, Belgium, October 31 -
November 1, 2018. Association for Computational Linguistic
Towards Unifying Grounded and Distributional Semantics Using the Words-as-Classifiers Model of Lexical Semantics
Automated systems that make use of language, such as personal assistants, need some means of representing words such that 1) the representation is computable and 2) captures form and meaning. Recent advancements in the field of natural language processing have resulted in useful approaches to representing computable word meanings. In this thesis, I consider two such approaches: distributional embeddings and grounded models. Distributional embeddings are represented as high-dimensional vectors; words with similar meanings tend to cluster together in embedding space. Embeddings are easily learned using large amounts of text data. However, embeddings suffer from a lack of real world knowledge; for example, the knowledge of identifying colors or objects as they appear. In contrast to embeddings, grounded models learn a mapping between language and the physical world, such as visual information in pictures. Grounded models, however, tend to focus only on the mapping between language and the physical world and lack the knowledge that could be gained from considering abstract information found in text.
In this thesis, I evaluate wac2vec, a model that brings together grounded and distributional semantics to work towards leveraging the relative strengths of both, and use empirical analysis to explore whether wac2vec adds semantic information to traditional embeddings. Starting with the words-as-classifiers (WAC) model of grounded semantics, I use a large repository of images and the keywords that were used to retrieve those images. From the grounded model, I extract classifier coefficients as word-level vector embeddings (hence, wac2vec), then combine those with embeddings from distributional word representations. I show that combining grounded embeddings with traditional embeddings results in improved performance in a visual task, demonstrating the viability of using the wac2vec model to enrich traditional embeddings, and showing that wac2vec provides important semantic information that these embeddings do not have on their own
Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning
We propose a novel approach for unsupervised zero-shot learning (ZSL) of
classes based on their names. Most existing unsupervised ZSL methods aim to
learn a model for directly comparing image features and class names. However,
this proves to be a difficult task due to dominance of non-visual semantics in
underlying vector-space embeddings of class names. To address this issue, we
discriminatively learn a word representation such that the similarities between
class and combination of attribute names fall in line with the visual
similarity. Contrary to the traditional zero-shot learning approaches that are
built upon attribute presence, our approach bypasses the laborious
attribute-class relation annotations for unseen classes. In addition, our
proposed approach renders text-only training possible, hence, the training can
be augmented without the need to collect additional image data. The
experimental results show that our method yields state-of-the-art results for
unsupervised ZSL in three benchmark datasets.Comment: To appear at IEEE Int. Conference on Computer Vision (ICCV) 201
- …