5,792 research outputs found
Recommended from our members
Representing lexical ambiguity in prototype models of lexical semantics
We show, contrary to some recent claims in the literature, thatprototype distributional semantic models (DSMs) are capa-ble of representing multiple senses of ambiguous words, in-cluding infrequent meanings. We propose that word2vec con-tains a natural, model-internal way of operationalizing the dis-ambiguation process by leveraging the two sets of represen-tations word2vec learns, instead of just one as most workon this model does. We evaluate our approach on artifi-cial language simulations where other prototype DSMs havebeen shown to fail. We furthermore assess whether these re-sults scale to the disambiguation of naturalistic corpus exam-ples. We do so by replacing all instances of sampled pairsof words in a corpus with pseudo-homonym tokens, and test-ing whether models, after being trained on one half of the cor-pus, were able to disambiguate pseudo-homonyms on the ba-sis of their linguistic contexts in the second half of the cor-pus. We observe that word2vec well surpasses the baselineof always guessing the most frequent meaning to be the rightone. Moreover, it degrades gracefully: As words are moreunbalanced, the baseline is higher, and it is harder to surpassit; nonetheless, Word2vec succeeds at surpassing the baseline,even for pseudo-homonyms whose most frequent meaning ismuch more frequent than the other
Having Your Cake and Eating It Too: Autonomy and Interaction in a Model of Sentence Processing
Is the human language understander a collection of modular processes
operating with relative autonomy, or is it a single integrated process? This
ongoing debate has polarized the language processing community, with two
fundamentally different types of model posited, and with each camp concluding
that the other is wrong. One camp puts forth a model with separate processors
and distinct knowledge sources to explain one body of data, and the other
proposes a model with a single processor and a homogeneous, monolithic
knowledge source to explain the other body of data. In this paper we argue that
a hybrid approach which combines a unified processor with separate knowledge
sources provides an explanation of both bodies of data, and we demonstrate the
feasibility of this approach with the computational model called COMPERE. We
believe that this approach brings the language processing community
significantly closer to offering human-like language processing systems.Comment: 7 pages, uses aaai.sty macr
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Robust Grammatical Analysis for Spoken Dialogue Systems
We argue that grammatical analysis is a viable alternative to concept
spotting for processing spoken input in a practical spoken dialogue system. We
discuss the structure of the grammar, and a model for robust parsing which
combines linguistic sources of information and statistical sources of
information. We discuss test results suggesting that grammatical processing
allows fast and accurate processing of spoken input.Comment: Accepted for JNL
Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning
Deep compositional models of meaning acting on distributional representations
of words in order to produce vectors of larger text constituents are evolving
to a popular area of NLP research. We detail a compositional distributional
framework based on a rich form of word embeddings that aims at facilitating the
interactions between words in the context of a sentence. Embeddings and
composition layers are jointly learned against a generic objective that
enhances the vectors with syntactic information from the surrounding context.
Furthermore, each word is associated with a number of senses, the most
plausible of which is selected dynamically during the composition process. We
evaluate the produced vectors qualitatively and quantitatively with positive
results. At the sentence level, the effectiveness of the framework is
demonstrated on the MSRPar task, for which we report results within the
state-of-the-art range.Comment: Accepted for presentation at EMNLP 201
- …