Search CORE

9 research outputs found

Boosting Applied to Word Sense Disambiguation

Author: Escudero Gerard
Marquez Lluis
Rigau German
Publication venue
Publication date: 01/01/2000
Field of study

In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

High WSD Accuracy Using Naive Bayesian Classifier with Rich Features

Author: Le Cuong Anh
島津明
Publication venue: Logico-Linguistic Society of Japan
Publication date: 16/11/2005
Field of study

Word Sense Disambiguation (WSD) is the task of choosing the right sense of an ambiguous word given a context. Using Naive Bayesian (NB) classifiers is known as one of the best methods for supervised approaches for WSD (Mooney, 1996; Pedersen, 2000), and this model usually uses only a topic context represented by unordered words in a large context. In this paper, we show that by adding more rich knowledge, represented by ordered words in a local context and collocations, the NB classifier can achieve higher accuracy in comparison with the best previously published results. The features were chosen using a forward sequential selection algorithm. Our experiments obtained 92.3% accuracy for four common test words (interest, line, hard, serve). We also tested on a large dataset, the DSO corpus, and obtained accuracies of 66.4% for verbs and 72.7% for nouns

Waseda University Repository

Tasting Families of Features for Image Classification

Author: Dubout Charles
Fleuret Francois
Publication venue
Publication date: 06/07/2011
Field of study

Using multiple families of image features is a very efficient strategy to improve performance in object detection or recognition. However, such a strategy induces multiple challenges for machine learning methods, both from a computational and a statistical perspective. The main contribution of this paper is a novel feature sampling procedure dubbed “Tasting” to improve the efficiency of Boosting in such a context. Instead of sampling features in a uniform manner, Tasting continuously estimates the expected loss reduction for each family from a limited set of features sampled prior to the learning, and biases the sampling accordingly. We evaluate the performance of this procedure with tens of families of features on four image classification and object detection data-sets. We show that Tasting, which does not require the tuning of any meta-parameter, outperforms systematically variants of uniform sampling and state-of-the-art approaches based on bandit strategies

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Automatic generation of labelled data for word sense disambiguation

Author: WANG YUNYAN
Publication venue
Publication date: 04/05/2004
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost

Author: Bishop
Bruce
Cohen
Cristianini
Dietterich
Enrique Romero
Freund
Freund
Girosi
Ide
Joachims
Leshno
Lluı́s Màrquez
López de Mántaras
Park
Perrone
Raudys
Romero
Rätsch
Schapire
Schapire
Sebastiani
Smola
Suykens
Towell
Vapnik
Vapnik
Xavier Carreras
Yarowsky
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

Author: Anderton Jesse
Aslam Javed
Aziz Maryam
Kaufmann Emilie
Publication venue: HAL CCSD
Publication date: 07/04/2018
Field of study

International audienceWe consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our log^2(1/delta) dependence is inescapable for ``two-phase'' (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold

INRIA a CCSD electronic archive server

Feasibility of using citations as document summaries

Author: Hand Jeff
Publication venue: Drexel University
Publication date: 01/12/2003
Field of study

The purpose of this research is to establish whether it is feasible to use citations as document summaries. People are good at creating and selecting summaries and are generally the standard for evaluating computer generated summaries. Citations can be characterized as concept symbols or short summaries of the document they are citing. Similarity metrics have been used in retrieval and text summarization to determine how alike two documents are. Similarity metrics have never been compared to what human subjects think are similar between two documents. If similarity metrics reflect human judgment, then we can mechanize the selection of citations that act as short summaries of the document they are citing. The research approach was to gather rater data comparing document abstracts to citations about the same document and then to statistically compare those results to several document metrics; frequency count, similarity metric, citation location and type of citation. There were two groups of raters, subject experts and non-experts. Both groups of raters were asked to evaluate seven parameters between abstract and citations: purpose, subject matter, methods, conclusions, findings, implications, readability, andunderstandability. The rater was to identify how strongly the citation represented the content of the abstract, on a five point likert scale. Document metrics were collected for frequency count, cosine, and similarity metric between abstracts and associated citations. In addition, data was collected on the location of the citations and the type of citation. Location was identified and dummy coded for introduction, method, discussion, review of the literature and conclusion. Citations were categorized and dummy coded for whether they refuted, noted, supported, reviewed, or applied information about the cited document. The results show there is a relationship between some similarity metrics and human judgment of similarity.Ph.D., Information Studies -- Drexel University, 200

Drexel Libraries E-Repository and Archives