Search CORE

7,187 research outputs found

Word sense disambiguation and information retrieval

Author: Sanderson M.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1994
Field of study

It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval (IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will increase. However, recent research into the application of a word sense disambiguator to an IR system failed to show any performance increase. From these results it has become clear that more basic research is needed to investigate the relationship between sense ambiguity, disambiguation, and IR. Using a technique that introduces additional sense ambiguity into a collection, this paper presents research that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of accuracy

White Rose Research Online

Word sense disambiguation and information retrieval

Author: Sanderson M.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1914
Field of study

MIT Libraries Dome

White Rose Research Online

A Similarity Based Concordance Approach to Word Sense Disambiguation

Author: Guru Ramakrishnan B.
Publication venue: The Keep
Publication date: 01/01/2004
Field of study

This study attempts to solve the problem of Word Sense Disambiguation using a combination of statistical, probabilistic and word matching algorithms. These algorithms consider that words and sentences have some hidden similarities and that the polysemous words in any context should be assigned to a sense after each execution of the algorithm. The algorithm was tested with sufficient sample data and the efficiency of the disambiguation performance has proven to increase significantly after the inclusion of the concordance methodology

Eastern Illinois University

A Similarity Based Concordance Approach to Word Sense Disambiguation

Author: Guru Ramakrishnan B.
Publication venue: The Keep
Publication date: 01/01/2004
Field of study

Similarity-Based Models of Word Cooccurrence Probabilities

Author: Dagan Ido
Lee Lillian
Pereira Fernando C. N.
Publication venue
Publication date: 27/09/1998
Field of study

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on ``most similar'' words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.Comment: 26 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Disambiguation strategies for cross-language information retrieval

Author: D. Harman
G. Salton
S.E. Robertson
Publication venue: Springer Verlag
Publication date: 01/01/1999
Field of study

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching

CiteSeerX

Crossref

Radboud Repository

University of Twente Research Information