9,326 research outputs found
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Boosting Applied to Word Sense Disambiguation
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied
to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of
15 selected polysemous words show that the boosting approach surpasses Naive
Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy
on supervised WSD. In order to make boosting practical for a real learning
domain of thousands of words, several ways of accelerating the algorithm by
reducing the feature space are studied. The best variant, which we call
LazyBoosting, is tested on the largest sense-tagged corpus available containing
192,800 examples of the 191 most frequent and ambiguous English words. Again,
boosting compares favourably to the other benchmark algorithms.Comment: 12 page
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Word Sense Disambiguation using a Bidirectional LSTM
In this paper we present a clean, yet effective, model for word sense
disambiguation. Our approach leverage a bidirectional long short-term memory
network which is shared between all words. This enables the model to share
statistical strength and to scale well with vocabulary size. The model is
trained end-to-end, directly from the raw text to sense labels, and makes
effective use of word order. We evaluate our approach on two standard datasets,
using identical hyperparameter settings, which are in turn tuned on a third set
of held out data. We employ no external resources (e.g. knowledge graphs,
part-of-speech tagging, etc), language specific features, or hand crafted
rules, but still achieve statistically equivalent results to the best
state-of-the-art systems, that employ no such limitations
- …