Search CORE

3,922 research outputs found

Finding predominant word senses in untagged text

Author: Carroll John
Koeling Rob
McCarthy Diana Frances
Weeds Julie
Publication venue
Publication date: 01/01/2004
Field of study

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of handtagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particularly topical words, to depend on the genre and domain of the text under consideration. We present work on the use of a thesaurus acquired from raw textual corpora and the WordNet similarity package to find predominant noun senses automatically. The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL- 2 English all-words task. This is a very promising result given that our method does not require any hand-tagged text, such as SemCor. Furthermore, we demonstrate that our method discovers appropriate predominant senses for words from two domainspecific corpora

CiteSeerX

Crossref

Sussex Research Online

The interaction of knowledge sources in word sense disambiguation

Author: Brill Eric
Daelemans Walter
Daelemans Walter
Ide Nancy
Kilgarriff Adam
Marcus Mitchell
Mark Stevenson
Masterman Margaret
McRoy Susan
Yorick Wilks
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2001
Field of study

Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

CiteSeerX

Crossref

White Rose Research Online