304 research outputs found
Maximum Margin Multiclass Nearest Neighbors
We develop a general framework for margin-based multicategory classification
in metric spaces. The basic work-horse is a margin-regularized version of the
nearest-neighbor classifier. We prove generalization bounds that match the
state of the art in sample size and significantly improve the dependence on
the number of classes . Our point of departure is a nearly Bayes-optimal
finite-sample risk bound independent of . Although -free, this bound is
unregularized and non-adaptive, which motivates our main result: Rademacher and
scale-sensitive margin bounds with a logarithmic dependence on . As the best
previous risk estimates in this setting were of order , our bound is
exponentially sharper. From the algorithmic standpoint, in doubling metric
spaces our classifier may be trained on examples in time and
evaluated on new points in time
Solving Multiclass Learning Problems via Error-Correcting Output Codes
Multiclass learning problems involve finding a definition for an unknown
function f(x) whose range is a discrete set containing k > 2 values (i.e., k
``classes''). The definition is acquired by studying collections of training
examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning
problems include direct application of multiclass algorithms such as the
decision-tree algorithms C4.5 and CART, application of binary concept learning
algorithms to learn individual binary functions for each of the k classes, and
application of binary concept learning algorithms with distributed output
representations. This paper compares these three approaches to a new technique
in which error-correcting codes are employed as a distributed output
representation. We show that these output representations improve the
generalization performance of both C4.5 and backpropagation on a wide range of
multiclass learning tasks. We also demonstrate that this approach is robust
with respect to changes in the size of the training sample, the assignment of
distributed representations to particular classes, and the application of
overfitting avoidance techniques such as decision-tree pruning. Finally, we
show that---like the other methods---the error-correcting code technique can
provide reliable class probability estimates. Taken together, these results
demonstrate that error-correcting output codes provide a general-purpose method
for improving the performance of inductive learning programs on multiclass
problems.Comment: See http://www.jair.org/ for any accompanying file
Totally Corrective Multiclass Boosting with Binary Weak Learners
In this work, we propose a new optimization framework for multiclass boosting
learning. In the literature, AdaBoost.MO and AdaBoost.ECC are the two
successful multiclass boosting algorithms, which can use binary weak learners.
We explicitly derive these two algorithms' Lagrange dual problems based on
their regularized loss functions. We show that the Lagrange dual formulations
enable us to design totally-corrective multiclass algorithms by using the
primal-dual optimization technique. Experiments on benchmark data sets suggest
that our multiclass boosting can achieve a comparable generalization capability
with state-of-the-art, but the convergence speed is much faster than stage-wise
gradient descent boosting. In other words, the new totally corrective
algorithms can maximize the margin more aggressively.Comment: 11 page
Boosting Nearest Neighbor Classifiers for Multiclass Recognition
This paper introduces an algorithm that uses boosting to learn a distance measure for multiclass k-nearest neighbor classification. Given a family of distance measures as input, AdaBoost is used to learn a weighted distance measure, that is a linear combination of the input measures. The proposed method can be seen both as a novel way to learn a distance measure from data, and as a novel way to apply boosting to multiclass recognition problems, that does not require output codes. In our approach, multiclass recognition of objects is reduced into a single binary recognition task, defined on triples of objects. Preliminary experiments with eight UCI datasets yield no clear winner among our method, boosting using output codes, and k-nn classification using an unoptimized distance measure. Our algorithm did achieve lower error rates in some of the datasets, which indicates that, in some domains, it may lead to better results than existing methods
Solving for multi-class using orthogonal coding matrices
A common method of generalizing binary to multi-class classification is the
error correcting code (ECC). ECCs may be optimized in a number of ways, for
instance by making them orthogonal. Here we test two types of orthogonal ECCs
on seven different datasets using three types of binary classifier and compare
them with three other multi-class methods: 1 vs. 1, one-versus-the-rest and
random ECCs. The first type of orthogonal ECC, in which the codes contain no
zeros, admits a fast and simple method of solving for the probabilities.
Orthogonal ECCs are always more accurate than random ECCs as predicted by
recent literature. Improvments in uncertainty coefficient (U.C.) range between
0.4--17.5% (0.004--0.139, absolute), while improvements in Brier score between
0.7--10.7%. Unfortunately, orthogonal ECCs are rarely more accurate than 1 vs.
1. Disparities are worst when the methods are paired with logistic regression,
with orthogonal ECCs never beating 1 vs. 1. When the methods are paired with
SVM, the losses are less significant, peaking at 1.5%, relative, 0.011 absolute
in uncertainty coefficient and 6.5% in Brier scores. Orthogonal ECCs are always
the fastest of the five multi-class methods when paired with linear
classifiers. When paired with a piecewise linear classifier, whose
classification speed does not depend on the number of training samples,
classifications using orthogonal ECCs were always more accurate than the the
remaining three methods and also faster than 1 vs. 1. Losses against 1 vs. 1
here were higher, peaking at 1.9% (0.017, absolute), in U.C. and 39% in Brier
score. Gains in speed ranged between 1.1% and over 100%. Whether the speed
increase is worth the penalty in accuracy will depend on the application
- …