Finding predominant word senses in untagged text

Carroll, John; Koeling, Rob; McCarthy, Diana Frances; Weeds, Julie

research

Finding predominant word senses in untagged text

Authors: John Carroll
Rob Koeling
Diana Frances McCarthy
Julie Weeds
Publication date: 1 January 2004
Publisher
Doi

Abstract

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of handtagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particularly topical words, to depend on the genre and domain of the text under consideration. We present work on the use of a thesaurus acquired from raw textual corpora and the WordNet similarity package to find predominant noun senses automatically. The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL- 2 English all-words task. This is a very promising result given that our method does not require any hand-tagged text, such as SemCor. Furthermore, we demonstrate that our method discovers appropriate predominant senses for words from two domainspecific corpora

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Sussex Research Online

oai:figshare.com:article/23311...

Last time updated on 05/12/2023

Crossref

Last time updated on 01/04/2019