5,691 research outputs found
Similarity-Based Models of Word Cooccurrence Probabilities
In many applications of natural language processing (NLP) it is necessary to
determine the likelihood of a given word combination. For example, a speech
recognizer may need to determine which of the two word combinations ``eat a
peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine
the likelihood of a word combination from its frequency in a training corpus.
However, the nature of language is such that many word combinations are
infrequent and do not occur in any given corpus. In this work we propose a
method for estimating the probability of such previously unseen word
combinations using available information on ``most similar'' words.
We describe probabilistic word association models based on distributional
word similarity, and apply them to two tasks, language modeling and pseudo-word
disambiguation. In the language modeling task, a similarity-based model is used
to improve probability estimates for unseen bigrams in a back-off language
model. The similarity-based method yields a 20% perplexity improvement in the
prediction of unseen bigrams and statistically significant reductions in
speech-recognition error.
We also compare four similarity-based estimation methods against back-off and
maximum-likelihood estimation methods on a pseudo-word sense disambiguation
task in which we controlled for both unigram and bigram frequency to avoid
giving too much weight to easy-to-disambiguate high-frequency configurations.
The similarity-based methods perform up to 40% better on this particular task.Comment: 26 pages, 5 figure
Word Sense Determination from Wikipedia Data Using Neural Networks
Many words have multiple meanings. For example, “plant” can mean a type of living organism or a factory. Being able to determine the sense of such words is very useful in natural language processing tasks, such as speech synthesis, question answering, and machine translation. For the project described in this report, we used a modular model to classify the sense of words to be disambiguated. This model consisted of two parts: The first part was a neural-network-based language model to compute continuous vector representations of words from data sets created from Wikipedia pages. The second part classified the meaning of the given word without explicitly knowing what the meaning is. In this unsupervised word sense determination task, we did not need human-tagged training data or a dictionary of senses for each word. We tested the model with some naturally ambiguous words, and compared our experimental results with the related work by Schütze in 1998. Our model achieved similar accuracy as Schütze’s work for some words
The Measure of a Model
This paper describes measures for evaluating the three determinants of how
well a probabilistic classifier performs on a given test set. These
determinants are the appropriateness, for the test set, of the results of (1)
feature selection, (2) formulation of the parametric form of the model, and (3)
parameter estimation. These are part of any model formulation procedure, even
if not broken out as separate steps, so the tradeoffs explored in this paper
are relevant to a wide variety of methods. The measures are demonstrated in a
large experiment, in which they are used to analyze the results of roughly 300
classifiers that perform word-sense disambiguation.Comment: 12 pages, uuencoded compressed postscript fil
AutoSense Model for Word Sense Induction
Word sense induction (WSI), or the task of automatically discovering multiple
senses or meanings of a word, has three main challenges: domain adaptability,
novel sense detection, and sense granularity flexibility. While current latent
variable models are known to solve the first two challenges, they are not
flexible to different word sense granularities, which differ very much among
words, from aardvark with one sense, to play with over 50 senses. Current
models either require hyperparameter tuning or nonparametric induction of the
number of senses, which we find both to be ineffective. Thus, we aim to
eliminate these requirements and solve the sense granularity problem by
proposing AutoSense, a latent variable model based on two observations: (1)
senses are represented as a distribution over topics, and (2) senses generate
pairings between the target word and its neighboring word. These observations
alleviate the problem by (a) throwing garbage senses and (b) additionally
inducing fine-grained word senses. Results show great improvements over the
state-of-the-art models on popular WSI datasets. We also show that AutoSense is
able to learn the appropriate sense granularity of a word. Finally, we apply
AutoSense to the unsupervised author name disambiguation task where the sense
granularity problem is more evident and show that AutoSense is evidently better
than competing models. We share our data and code here:
https://github.com/rktamplayo/AutoSense.Comment: AAAI 201
MUSE: Modularizing Unsupervised Sense Embeddings
This paper proposes to address the word sense ambiguity issue in an
unsupervised manner, where word sense representations are learned along a word
sense selection mechanism given contexts. Prior work focused on designing a
single model to deliver both mechanisms, and thus suffered from either
coarse-grained representation learning or inefficient sense selection. The
proposed modular approach, MUSE, implements flexible modules to optimize
distinct mechanisms, achieving the first purely sense-level representation
learning system with linear-time sense selection. We leverage reinforcement
learning to enable joint training on the proposed modules, and introduce
various exploration techniques on sense selection for better robustness. The
experiments on benchmark data show that the proposed approach achieves the
state-of-the-art performance on synonym selection as well as on contextual word
similarities in terms of MaxSimC
Distinguishing Word Senses in Untagged Text
This paper describes an experimental comparison of three unsupervised
learning algorithms that distinguish the sense of an ambiguous word in untagged
text. The methods described in this paper, McQuitty's similarity analysis,
Ward's minimum-variance method, and the EM algorithm, assign each instance of
an ambiguous word to a known sense definition based solely on the values of
automatically identifiable features in text. These methods and feature sets are
found to be more successful in disambiguating nouns rather than adjectives or
verbs. Overall, the most accurate of these procedures is McQuitty's similarity
analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st
- …