16,516 research outputs found
Recommended from our members
Linguistic Distributional Information and Sensorimotor Similarity BothContribute to Semantic Category Production
We investigated the contribution of sensorimotor and linguistic distributional information in a semantic category produc-tion task, hypothesizing that the task would rely on both but particularly on linguistic distributional information, whichmay provide a shortcut for conceptual processing. In a pre-registered study, we asked participants to name members ofsemantic categories and tested whether responses were predicted by a novel measure of sensorimotor proximity (based onan 11-dimension representation of sensorimotor experience) and linguistic proximity (based on word co-occurrence de-rived from a large subtitle corpus). Both proximity measures predicted the order and frequency of responses and, critically,linguistic proximity had an effect above and beyond sensorimotor proximity. Our findings support linguistic-sensorimotoraccounts of the conceptual system and suggest that category production is based on both the similarity of sensorimotor ex-perience between the category and member concepts, and on the linguistic distributional relationship between the categoryand member labels
Hierarchies over Vector Space: Orienting Word and Graph Embeddings
Word and graph embeddings are widely used in deep learning applications. We
present a data structure that captures inherent hierarchical properties from an
unordered flat embedding space, particularly a sense of direction between pairs
of entities. Inspired by the notion of \textit{distributional generality}, our
algorithm constructs an arborescence (a directed rooted tree) by inserting
nodes in descending order of entity power (e.g., word frequency), pointing each
entity to the closest more powerful node as its parent.
We evaluate the performance of the resulting tree structures on three tasks:
hypernym relation discovery, least-common-ancestor (LCA) discovery among words,
and Wikipedia page link recovery. We achieve average 8.98\% and 2.70\% for
hypernym and LCA discovery across five languages and 62.76\% accuracy on
directed Wiki-page link recovery, with both substantially above baselines.
Finally, we investigate the effect of insertion order, the power/similarity
trade-off and various power sources to optimize parent selection
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
A distributional model of semantic context effects in lexical processinga
One of the most robust findings of experimental psycholinguistics is that the context in which a word is presented influences the effort involved in processing that word. We present a novel model of contextual facilitation based on word co-occurrence prob ability distributions, and empirically validate the model through simulation of three representative types of context manipulation: single word priming, multiple-priming and contextual constraint. In our simulations the effects of semantic context are mod eled using general-purpose techniques and representations from multivariate statistics, augmented with simple assumptions reflecting the inherently incremental nature of speech understanding. The contribution of our study is to show that special-purpose m echanisms are not necessary in order to capture the general pattern of the experimental results, and that a range of semantic context effects can be subsumed under the same principled account.›
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Distributional semantics beyond words: Supervised learning of analogy and paraphrase
There have been several efforts to extend distributional semantics beyond
individual words, to measure the similarity of word pairs, phrases, and
sentences (briefly, tuples; ordered sets of words, contiguous or
noncontiguous). One way to extend beyond words is to compare two tuples using a
function that combines pairwise similarities between the component words in the
tuples. A strength of this approach is that it works with both relational
similarity (analogy) and compositional similarity (paraphrase). However, past
work required hand-coding the combination function for different tasks. The
main contribution of this paper is that combination functions are generated by
supervised learning. We achieve state-of-the-art results in measuring
relational similarity between word pairs (SAT analogies and SemEval~2012 Task
2) and measuring compositional similarity between noun-modifier phrases and
unigrams (multiple-choice paraphrase questions)
- …