264 research outputs found
Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited
This paper describes an experimental comparison between two standard
supervised learning methods, namely Naive Bayes and Exemplar-based
classification, on the Word Sense Disambiguation (WSD) problem. The aim of the
work is twofold. Firstly, it attempts to contribute to clarify some confusing
information about the comparison between both methods appearing in the related
literature. In doing so, several directions have been explored, including:
testing several modifications of the basic learning algorithms and varying the
feature space. Secondly, an improvement of both algorithms is proposed, in
order to deal with large attribute sets. This modification, which basically
consists in using only the positive information appearing in the examples,
allows to improve greatly the efficiency of the methods, with no loss in
accuracy. The experiments have been performed on the largest sense-tagged
corpus available containing the most frequent and ambiguous English words.
Results show that the Exemplar-based approach to WSD is generally superior to
the Bayesian approach, especially when a specific metric for dealing with
symbolic attributes is used.Comment: 5 page
Using WordNet for Building WordNets
This paper summarises a set of methodologies and techniques for the fast
construction of multilingual WordNets. The English WordNet is used in this
approach as a backbone for Catalan and Spanish WordNets and as a lexical
knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL
Boosting Applied to Word Sense Disambiguation
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied
to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of
15 selected polysemous words show that the boosting approach surpasses Naive
Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy
on supervised WSD. In order to make boosting practical for a real learning
domain of thousands of words, several ways of accelerating the algorithm by
reducing the feature space are studied. The best variant, which we call
LazyBoosting, is tested on the largest sense-tagged corpus available containing
192,800 examples of the 191 most frequent and ambiguous English words. Again,
boosting compares favourably to the other benchmark algorithms.Comment: 12 page
- …