354 research outputs found
Distinguishing Word Senses in Untagged Text
This paper describes an experimental comparison of three unsupervised
learning algorithms that distinguish the sense of an ambiguous word in untagged
text. The methods described in this paper, McQuitty's similarity analysis,
Ward's minimum-variance method, and the EM algorithm, assign each instance of
an ambiguous word to a known sense definition based solely on the values of
automatically identifiable features in text. These methods and feature sets are
found to be more successful in disambiguating nouns rather than adjectives or
verbs. Overall, the most accurate of these procedures is McQuitty's similarity
analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st
Sense Tagging: Semantic Tagging with a Lexicon
Sense tagging, the automatic assignment of the appropriate sense from some
lexicon to each of the words in a text, is a specialised instance of the
general problem of semantic tagging by category or type. We discuss which
recent word sense disambiguation algorithms are appropriate for sense tagging.
It is our belief that sense tagging can be carried out effectively by combining
several simple, independent, methods and we include the design of such a
tagger. A prototype of this system has been implemented, correctly tagging 86%
of polysemous word tokens in a small test set, providing evidence that our
hypothesis is correct.Comment: 6 pages, uses aclap LaTeX style file. Also in Proceedings of the
SIGLEX Workshop "Tagging Text with Lexical Semantics
An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Unsupervised sense induction methods offer a solution to the
problem of scarcity of semantic resources. These methods
automatically extract semantic information from textual data
and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used
The interaction of knowledge sources in word sense disambiguation
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results.
We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems
- …