38 research outputs found
Co-occurrence Vectors from Corpora vs. Distance Vectors from Dictionaries
A comparison was made of vectors derived by using ordinary co-occurrence
statistics from large text corpora and of vectors derived by measuring the
inter-word distances in dictionary definitions. The precision of word sense
disambiguation by using co-occurrence vectors from the 1987 Wall Street Journal
(20M total words) was higher than that by using distance vectors from the
Collins English Dictionary (60K head words + 1.6M definition words). However,
other experimental results suggest that distance vectors contain some different
semantic information from co-occurrence vectors.Comment: 6 pages, appeared in the Proc. of COLING94 (pp. 304-309)
Distinguishing Word Senses in Untagged Text
This paper describes an experimental comparison of three unsupervised
learning algorithms that distinguish the sense of an ambiguous word in untagged
text. The methods described in this paper, McQuitty's similarity analysis,
Ward's minimum-variance method, and the EM algorithm, assign each instance of
an ambiguous word to a known sense definition based solely on the values of
automatically identifiable features in text. These methods and feature sets are
found to be more successful in disambiguating nouns rather than adjectives or
verbs. Overall, the most accurate of these procedures is McQuitty's similarity
analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st
Retrieving with good sense
Although always present in text, word sense ambiguity only recently became regarded as a problem to information
retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in
disambiguation research. This paper first outlines this research and surveys the resulting efforts in information
retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt
from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval
Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation
We replace the overlap mechanism of the Lesk algorithm with a simple, general-purpose Naive Bayes model that mea-sures many-to-many association between two sets of random variables. Even with simple probability estimates such as max-imum likelihood, the model gains signifi-cant improvement over the Lesk algorithm on word sense disambiguation tasks. With additional lexical knowledge from Word-Net, performance is further improved to surpass the state-of-the-art results.
Mapping Persian Words to WordNet Synsets
Lexical ontologies are one of the main resources
for developing natural language processing and semantic web
applications. Mapping lexical ontologies of different languages
is very important for inter-lingual tasks. On the other hand
mapping approaches can be implied to build lexical ontologies
for a new language based on pre-existing resources of other
languages. In this paper we propose a semantic approach for
mapping Persian words to Princeton WordNet Synsets. As
there is no lexical ontology for Persian, our approach helps not
only in building one for this language but also enables semantic
web applications on Persian documents. To do the mapping, we
calculate the similarity of Persian words and English synsets
using their features such as super-classes and subclasses,
domain and related words. Our approach is an improvement of
an existing one applying in a new domain, which increases the
recall noticeably
Sense and preference
AbstractSemantic networks have shown considerable utility as a knowledge representation for Natural Language Processing (NLP). This paper describes a system for automatically deriving network structures from machine-readable dictionary text. This strategy helps to solve the problem of vocabulary acquisition for large-scale parsing systems, but also introduces an extra level of difficulty in terms of word-sense ambiguity. A Preference Semantics parsing system that operates over this network is discussed, in particular as regards its mechanism for using the network for lexical selection
The interaction of knowledge sources in word sense disambiguation
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results.
We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems