16,399 research outputs found
From Distributional to Semantic Similarity
Institute for Communicating and Collaborative SystemsLexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated
into a wide range of applications in Natural Language Processing. However they are
very difficult and expensive to create and maintain, and their usefulness has been severely
hampered by their limited coverage, bias and inconsistency. Automated and semi-automated
methods for developing such resources are therefore crucial for further resource development
and improved application performance.
Systems that extract thesauri often identify similar words using the distributional hypothesis
that similar words appear in similar contexts. This approach involves using corpora to examine
the contexts each word appears in and then calculating the similarity between context distributions.
Different definitions of context can be used, and I begin by examining how different
types of extracted context influence similarity.
To be of most benefit these systems must be capable of finding synonyms for rare words.
Reliable context counts for rare events can only be extracted from vast collections of text. In
this dissertation I describe how to extract contexts from a corpus of over 2 billion words. I
describe techniques for processing text on this scale and examine the trade-off between context
accuracy, information content and quantity of text analysed.
Distributional similarity is at best an approximation to semantic similarity. I develop improved
approximations motivated by the intuition that some events in the context distribution are more
indicative of meaning than others. For instance, the object-of-verb context wear is far more
indicative of a clothing noun than get. However, existing distributional techniques do not
effectively utilise this information. The new context-weighted similarity metric I propose in
this dissertation significantly outperforms every distributional similarity metric described in
the literature.
Nearest-neighbour similarity algorithms scale poorly with vocabulary and context vector size.
To overcome this problem I introduce a new context-weighted approximation algorithm with
bounded complexity in context vector size that significantly reduces the system runtime with
only a minor performance penalty. I also describe a parallelized version of the system that runs
on a Beowulf cluster for the 2 billion word experiments.
To evaluate the context-weighted similarity measure I compare ranked similarity lists against
gold-standard resources using precision and recall-based measures from Information Retrieval,
since the alternative, application-based evaluation, can often be influenced by distributional
as well as semantic similarity. I also perform a detailed analysis of the final results using
WORDNET.
Finally, I apply my similarity metric to the task of assigning words to WORDNET semantic
categories. I demonstrate that this new approach outperforms existing methods and overcomes
some of their weaknesses
Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
Modelling semantic similarity plays a fundamental role in lexical semantic
applications. A natural way of calculating semantic similarity is to access
handcrafted semantic networks, but similarity prediction can also be
anticipated in a distributional vector space. Similarity calculation continues
to be a challenging task, even with the latest breakthroughs in deep neural
language models. We first examined popular methodologies in measuring taxonomic
similarity, including edge-counting that solely employs semantic relations in a
taxonomy, as well as the complex methods that estimate concept specificity. We
further extrapolated three weighting factors in modelling taxonomic similarity.
To study the distinct mechanisms between taxonomic and distributional
similarity measures, we ran head-to-head comparisons of each measure with human
similarity judgements from the perspectives of word frequency, polysemy degree
and similarity intensity. Our findings suggest that without fine-tuning the
uniform distance, taxonomic similarity measures can depend on the shortest path
length as a prime factor to predict semantic similarity; in contrast to
distributional semantics, edge-counting is free from sense distribution bias in
use and can measure word similarity both literally and metaphorically; the
synergy of retrofitting neural embeddings with concept relations in similarity
prediction may indicate a new trend to leverage knowledge bases on transfer
learning. It appears that a large gap still exists on computing semantic
similarity among different ranges of word frequency, polysemous degree and
similarity intensity
Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos
We propose a new zero-shot Event Detection method by Multi-modal
Distributional Semantic embedding of videos. Our model embeds object and action
concepts as well as other available modalities from videos into a
distributional semantic space. To our knowledge, this is the first Zero-Shot
event detection model that is built on top of distributional semantics and
extends it in the following directions: (a) semantic embedding of multimodal
information in videos (with focus on the visual modalities), (b) automatically
determining relevance of concepts/attributes to a free text query, which could
be useful for other applications, and (c) retrieving videos by free text event
query (e.g., "changing a vehicle tire") based on their content. We embed videos
into a distributional semantic space and then measure the similarity between
videos and the event query in a free text form. We validated our method on the
large TRECVID MED (Multimedia Event Detection) challenge. Using only the event
title as a query, our method outperformed the state-of-the-art that uses big
descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC
metric. It is also an order of magnitude faster.Comment: To appear in AAAI 201
Computational explorations of semantic cognition
Motivated by the widespread use of distributional models of semantics within the cognitive science community, we follow a computational modelling approach in order to better understand and expand the applicability of such models, as well as to test potential ways in which they can be improved and extended. We review evidence in favour of the assumption that distributional models capture important aspects of semantic cognition. We look at the modelsâ ability to account for behavioural data and fMRI patterns of brain activity, and investigate the structure of model-based, semantic networks. We test whether introducing affective information, obtained from a neural network model designed to predict emojis from co-occurring text, can improve the performance of linguistic and linguistic-visual models of semantics, in accounting for similarity/relatedness ratings. We find that adding visual and affective representations improves performance, especially for concrete and abstract words, respectively. We describe a processing model based on distributional semantics, in which activation spreads throughout a semantic network, as dictated by the patterns of semantic similarity between words. We show that the activation profile of the network, measured at various time points, can account for response time and accuracies in lexical and semantic decision tasks, as well as for concreteness/imageability and similarity/relatedness ratings. We evaluate the differences between concrete and abstract words, in terms of the structure of the semantic networks derived from distributional models of semantics. We examine how the structure is related to a number of factors that have been argued to differ between concrete and abstract words, namely imageability, age of acquisition, hedonic valence, contextual diversity, and semantic diversity. We use distributional models to explore factors that might be responsible for the poor linguistic performance of children suffering from Developmental Language Disorder. Based on the assumption that certain model parameters can be given a psychological interpretation, we start from âhealthyâ models, and generate âlesionedâ models, by manipulating the parameters. This allows us to determine the importance of each factor, and their effects with respect to learning concrete vs abstract words
Vector spaces for historical linguistics : using distributional semantics to study syntactic productivity in diachrony
This paper describes an application of dis- tributional semantics to the study of syn- tactic productivity in diachrony, i.e., the property of grammatical constructions to attract new lexical items over time. By providing an empirical measure of seman- tic similarity between words derived from lexical co-occurrences, distributional se- mantics not only reliably captures how the verbs in the distribution of a construc- tion are related, but also enables the use of visualization techniques and statistical modeling to analyze the semantic develop- ment of a construction over time and iden- tify the semantic determinants of syntactic productivity in naturally occurring data
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages
This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German,
French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). Semantic similarity and relatedness gold
standards have been initially used to support the evaluation of semantic distance measures in the context of linguistic and knowledge
resources and distributional semantic models. SemR-11 builds upon the English gold-standards of Miller & Charles (MC), Rubenstein &
Goodenough (RG), WordSimilarity 353 (WS-353), and Simlex-999, providing a canonical translation for them. The final dataset consists
of 15,917 word pairs and can be used to support the construction and evaluation of semantic similarity/relatedness and distributional
semantic models. As a case study, the SemR-11 test collections was used to investigate how different distributional semantic models
built from corpora in different languages and with different sizes perform in computing semantic relatedness similarity and relatedness
tasks
Distributional Formal Semantics
Natural language semantics has recently sought to combine the complementary
strengths of formal and distributional approaches to meaning. More
specifically, proposals have been put forward to augment formal semantic
machinery with distributional meaning representations, thereby introducing the
notion of semantic similarity into formal semantics, or to define
distributional systems that aim to incorporate formal notions such as
entailment and compositionality. However, given the fundamentally different
'representational currency' underlying formal and distributional approaches -
models of the world versus linguistic co-occurrence - their unification has
proven extremely difficult. Here, we define a Distributional Formal Semantics
that integrates distributionality into a formal semantic system on the level of
formal models. This approach offers probabilistic, distributed meaning
representations that are also inherently compositional, and that naturally
capture fundamental semantic notions such as quantification and entailment.
Furthermore, we show how the probabilistic nature of these representations
allows for probabilistic inference, and how the information-theoretic notion of
"information" (measured in terms of Entropy and Surprisal) naturally follows
from it. Finally, we illustrate how meaning representations can be derived
incrementally from linguistic input using a recurrent neural network model, and
how the resultant incremental semantic construction procedure intuitively
captures key semantic phenomena, including negation, presupposition, and
anaphoricity.Comment: To appear in: Information and Computation (WoLLIC 2019 Special Issue
- âŚ