Search CORE

400 research outputs found

Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited

Author: Escudero Gerard
Marquez Lluis
Rigau German
Publication venue
Publication date: 01/01/2000
Field of study

This paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar-based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusing information about the comparison between both methods appearing in the related literature. In doing so, several directions have been explored, including: testing several modifications of the basic learning algorithms and varying the feature space. Secondly, an improvement of both algorithms is proposed, in order to deal with large attribute sets. This modification, which basically consists in using only the positive information appearing in the examples, allows to improve greatly the efficiency of the methods, with no loss in accuracy. The experiments have been performed on the largest sense-tagged corpus available containing the most frequent and ambiguous English words. Results show that the Exemplar-based approach to WSD is generally superior to the Bayesian approach, especially when a specific metric for dealing with symbolic attributes is used.Comment: 5 page

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Boosting Applied to Word Sense Disambiguation

Author: Escudero Gerard
Marquez Lluis
Rigau German
Publication venue
Publication date: 01/01/2000
Field of study

In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Sometimes less is more : Romanian word sense disambiguation revisited

Author: Dinu Georgiana
Kübler Sandra
Publication venue
Publication date: 01/01/2007
Field of study

Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation

Hochschulschriftenserver - Universität Frankfurt am Main

Discriminating word senses with tourist walks in complex networks

Author: Amancio Diego R.
Silva Thiago C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2013
Field of study

Patterns of topological arrangement are widely used for both animal and human brains in the learning process. Nevertheless, automatic learning techniques frequently overlook these patterns. In this paper, we apply a learning technique based on the structural organization of the data in the attribute space to the problem of discriminating the senses of 10 polysemous words. Using two types of characterization of meanings, namely semantical and topological approaches, we have observed significative accuracy rates in identifying the suitable meanings in both techniques. Most importantly, we have found that the characterization based on the deterministic tourist walk improves the disambiguation process when one compares with the discrimination achieved with traditional complex networks measurements such as assortativity and clustering coefficient. To our knowledge, this is the first time that such deterministic walk has been applied to such a kind of problem. Therefore, our finding suggests that the tourist walk characterization may be useful in other related applications

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Selective Sampling for Example-based Word Sense Disambiguation

Author: Fujii Atsushi
Inui Kentaro
Tanaka Hozumi
Tokunaga Takenobu
Publication venue
Publication date: 01/01/1998
Field of study

This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure

arXiv.org e-Print Archive

CiteSeerX

Disambiguation of biomedical text using diverse sources of information

Author: A Aronson
A Aronson
A Budanitsky
A Harley
A Ratnaparkhi
B McInnes
C Friedman
C Manning
D McCarthy
D Swanson
D Widdows
David Martinez
E Agirre
G Leroy
H Liu
I Witten
L Humphreys
L Specia
M Joshi
M Schuemie
M Stevenson
M Stevenson
M Weeber
M Weeber
Mark Stevenson
N Ide
R Mihalcea
Robert Gaizauskas
S Humphrey
S Nelson
T Mitchell
T Pedersen
Yikun Guo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in which the ambiguous term is used and domain-specific resources, such as UMLS. Materials and methods: We compare various sources of information including ones which have been previously used and a novel one: MeSH terms. Evaluation is carried out using a standard test set (the NLM-WSD corpus). Results: The best performance is obtained using a combination of linguistic features and MeSH terms. Performance of our system exceeds previously published results for systems evaluated using the same data set. Conclusion: Disambiguation of biomedical terms benefits from the use of information from a variety of sources. In particular, MeSH terms have proved to be useful and should be used if available

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

White Rose Research Online

Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation

Author: Graeme Hirst
Tong Wang
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We replace the overlap mechanism of the Lesk algorithm with a simple, general-purpose Naive Bayes model that mea-sures many-to-many association between two sets of random variables. Even with simple probability estimates such as max-imum likelihood, the model gains signifi-cant improvement over the Lesk algorithm on word sense disambiguation tasks. With additional lexical knowledge from Word-Net, performance is further improved to surpass the state-of-the-art results.

CiteSeerX

Crossref

Automatic evaluation of privacy policy:a machine learning approach

Author: Sun Y.
Publication venue
Publication date: 29/02/2012
Field of study

Pure OAI Repository

Recommended from our members

A Survey of Wearable Biometric Recognition Systems

Author: Blasco J.
Chen T.
Peris-Lopez P.
Tapiador J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/09/2016
Field of study

The growing popularity of wearable devices is leading to new ways to interact with the environment, with other smart devices, and with other people. Wearables equipped with an array of sensors are able to capture the owner’s physiological and behavioural traits, thus are well suited for biometric authentication to control other devices or access digital services. However, wearable biometrics have substantial differences from traditional biometrics for computer systems, such as fingerprints, eye features, or voice. In this article, we discuss these differences and analyse how researchers are approaching the wearable biometrics field. We review and provide a categorization of wearable sensors useful for capturing biometric signals. We analyse the computational cost of the different signal processing techniques, an important practical factor in constrained devices such as wearables. Finally, we review and classify the most recent proposals in the field of wearable biometrics in terms of the structure of the biometric system proposed, their experimental setup, and their results. We also present a critique of experimental issues such as evaluation and feasibility aspects, and offer some final thoughts on research directions that need attention in future work

City Research Online

Crossref