2,319 research outputs found
A low variance error boosting algorithm
This paper introduces a robust variant of AdaBoost,
cw-AdaBoost, that uses weight perturbation to reduce
variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression
datasets, using 10-fold cross validation. The new algorithm
consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered
Boosting Applied to Word Sense Disambiguation
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied
to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of
15 selected polysemous words show that the boosting approach surpasses Naive
Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy
on supervised WSD. In order to make boosting practical for a real learning
domain of thousands of words, several ways of accelerating the algorithm by
reducing the feature space are studied. The best variant, which we call
LazyBoosting, is tested on the largest sense-tagged corpus available containing
192,800 examples of the 191 most frequent and ambiguous English words. Again,
boosting compares favourably to the other benchmark algorithms.Comment: 12 page
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
The Rhetoric and Reality of Anthropomorphism in Artificial Intelligence
Artificial intelligence (AI) has historically been conceptualized in anthropomorphic terms. Some algorithms deploy biomimetic designs in a deliberate attempt to effect a sort of digital isomorphism of the human brain. Others leverage more general learning strategies that happen to coincide with popular theories of cognitive science and social epistemology. In this paper, I challenge the anthropomorphic credentials of the neural network algorithm, whose similarities to human cognition I argue are vastly overstated and narrowly construed. I submit that three alternative supervised learning methods—namely lasso penalties, bagging, and boosting—offer subtler, more interesting analogies to human reasoning as both an individual and a social phenomenon. Despite the temptation to fall back on anthropomorphic tropes when discussing AI, however, I conclude that such rhetoric is at best misleading and at worst downright dangerous. The impulse to humanize algorithms is an obstacle to properly conceptualizing the ethical challenges posed by emerging technologies
Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings
We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.
- …