Search CORE

9,192 research outputs found

Maximum Margin Multiclass Nearest Neighbors

Author: Kontorovich Aryeh
Weiss Roi
Publication venue
Publication date: 01/01/2014
Field of study

We develop a general framework for margin-based multicategory classification in metric spaces. The basic work-horse is a margin-regularized version of the nearest-neighbor classifier. We prove generalization bounds that match the state of the art in sample size

n

and significantly improve the dependence on the number of classes

k

. Our point of departure is a nearly Bayes-optimal finite-sample risk bound independent of

k

. Although

k

-free, this bound is unregularized and non-adaptive, which motivates our main result: Rademacher and scale-sensitive margin bounds with a logarithmic dependence on

k

. As the best previous risk estimates in this setting were of order

\sqrt k

, our bound is exponentially sharper. From the algorithmic standpoint, in doubling metric spaces our classifier may be trained on

n

examples in

O(n^2\log n)

time and evaluated on new points in

O(\log n)

time

arXiv.org e-Print Archive

CiteSeerX

Scale-sensitive Psi-dimensions: the Capacity Measures for Classifiers Taking Values in R^Q

Author: Guermeur Yann
Publication venue
Publication date: 01/01/2007
Field of study

Bounds on the risk play a crucial role in statistical learning theory. They usually involve as capacity measure of the model studied the VC dimension or one of its extensions. In classification, such "VC dimensions" exist for models taking values in {0, 1}, {1,..., Q} and R. We introduce the generalizations appropriate for the missing case, the one of models with values in R^Q. This provides us with a new guaranteed risk for M-SVMs which appears superior to the existing one

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Efficient Classification for Metric Data

Author: Gottlieb Lee-Ad
Kontorovich Aryeh
Krauthgamer Robert
Publication venue
Publication date: 10/07/2014
Field of study

Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in Proceedings of the 23rd COLT, 201

arXiv.org e-Print Archive

CiteSeerX