7,187 research outputs found
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
A Similarity Based Concordance Approach to Word Sense Disambiguation
This study attempts to solve the problem of Word Sense Disambiguation using a combination of statistical, probabilistic and word matching algorithms. These algorithms consider that words and sentences have some hidden similarities and that the polysemous words in any context should be assigned to a sense after each execution of the algorithm. The algorithm was tested with sufficient sample data and the efficiency of the disambiguation performance has proven to increase significantly after the inclusion of the concordance methodology
A Similarity Based Concordance Approach to Word Sense Disambiguation
This study attempts to solve the problem of Word Sense Disambiguation using a combination of statistical, probabilistic and word matching algorithms. These algorithms consider that words and sentences have some hidden similarities and that the polysemous words in any context should be assigned to a sense after each execution of the algorithm. The algorithm was tested with sufficient sample data and the efficiency of the disambiguation performance has proven to increase significantly after the inclusion of the concordance methodology
Similarity-Based Models of Word Cooccurrence Probabilities
In many applications of natural language processing (NLP) it is necessary to
determine the likelihood of a given word combination. For example, a speech
recognizer may need to determine which of the two word combinations ``eat a
peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine
the likelihood of a word combination from its frequency in a training corpus.
However, the nature of language is such that many word combinations are
infrequent and do not occur in any given corpus. In this work we propose a
method for estimating the probability of such previously unseen word
combinations using available information on ``most similar'' words.
We describe probabilistic word association models based on distributional
word similarity, and apply them to two tasks, language modeling and pseudo-word
disambiguation. In the language modeling task, a similarity-based model is used
to improve probability estimates for unseen bigrams in a back-off language
model. The similarity-based method yields a 20% perplexity improvement in the
prediction of unseen bigrams and statistically significant reductions in
speech-recognition error.
We also compare four similarity-based estimation methods against back-off and
maximum-likelihood estimation methods on a pseudo-word sense disambiguation
task in which we controlled for both unigram and bigram frequency to avoid
giving too much weight to easy-to-disambiguate high-frequency configurations.
The similarity-based methods perform up to 40% better on this particular task.Comment: 26 pages, 5 figure
Disambiguation strategies for cross-language information retrieval
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching
- …