3,331 research outputs found
Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited
This paper describes an experimental comparison between two standard
supervised learning methods, namely Naive Bayes and Exemplar-based
classification, on the Word Sense Disambiguation (WSD) problem. The aim of the
work is twofold. Firstly, it attempts to contribute to clarify some confusing
information about the comparison between both methods appearing in the related
literature. In doing so, several directions have been explored, including:
testing several modifications of the basic learning algorithms and varying the
feature space. Secondly, an improvement of both algorithms is proposed, in
order to deal with large attribute sets. This modification, which basically
consists in using only the positive information appearing in the examples,
allows to improve greatly the efficiency of the methods, with no loss in
accuracy. The experiments have been performed on the largest sense-tagged
corpus available containing the most frequent and ambiguous English words.
Results show that the Exemplar-based approach to WSD is generally superior to
the Bayesian approach, especially when a specific metric for dealing with
symbolic attributes is used.Comment: 5 page
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
Selective Sampling for Example-based Word Sense Disambiguation
This paper proposes an efficient example sampling method for example-based
word sense disambiguation systems. To construct a database of practical size, a
considerable overhead for manual sense disambiguation (overhead for
supervision) is required. In addition, the time complexity of searching a
large-sized database poses a considerable problem (overhead for search). To
counter these problems, our method selectively samples a smaller-sized
effective subset from a given example set for use in word sense disambiguation.
Our method is characterized by the reliance on the notion of training utility:
the degree to which each example is informative for future example sampling
when used for the training of the system. The system progressively collects
examples by selecting those with greatest utility. The paper reports the
effectiveness of our method through experiments on about one thousand
sentences. Compared to experiments with other example sampling methods, our
method reduced both the overhead for supervision and the overhead for search,
without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure
Evaluating the semantic web: a task-based approach
The increased availability of online knowledge has led to the design of several algorithms that solve a variety of tasks by harvesting the Semantic Web, i.e. by dynamically selecting and exploring a multitude of online ontologies. Our hypothesis is that the performance of such novel algorithms implicity provides an insight into the quality of the used ontologies and thus opens the way to a task-based evaluation of the Semantic Web. We have investigated this hypothesis by studying the lessons learnt about online ontologies when used to solve three tasks: ontology matching, folksonomy enrichment, and word sense disambiguation. Our analysis leads to a suit of conclusions about the status of the Semantic Web, which highlight a number of strengths and weaknesses of the semantic information available online and complement the findings of other analysis of the Semantic Web landscape
Training and Scaling Preference Functions for Disambiguation
We present an automatic method for weighting the contributions of preference
functions used in disambiguation. Initial scaling factors are derived as the
solution to a least-squares minimization problem, and improvements are then
made by hill-climbing. The method is applied to disambiguating sentences in the
ATIS (Air Travel Information System) corpus, and the performance of the
resulting scaling factors is compared with hand-tuned factors. We then focus on
one class of preference function, those based on semantic lexical collocations.
Experimental results are presented showing that such functions vary
considerably in selecting correct analyses. In particular we define a function
that performs significantly better than ones based on mutual information and
likelihood ratios of lexical associations.Comment: To appear in Computational Linguistics (probably volume 20, December
94). LaTeX, 21 page
- …