Search CORE

404 research outputs found

Maximum Margin Multiclass Nearest Neighbors

Author: Kontorovich Aryeh
Weiss Roi
Publication venue
Publication date: 01/01/2014
Field of study

We develop a general framework for margin-based multicategory classification in metric spaces. The basic work-horse is a margin-regularized version of the nearest-neighbor classifier. We prove generalization bounds that match the state of the art in sample size

n

and significantly improve the dependence on the number of classes

k

. Our point of departure is a nearly Bayes-optimal finite-sample risk bound independent of

k

. Although

k

-free, this bound is unregularized and non-adaptive, which motivates our main result: Rademacher and scale-sensitive margin bounds with a logarithmic dependence on

k

. As the best previous risk estimates in this setting were of order

\sqrt k

, our bound is exponentially sharper. From the algorithmic standpoint, in doubling metric spaces our classifier may be trained on

n

examples in

O(n^2\log n)

time and evaluated on new points in

O(\log n)

time

arXiv.org e-Print Archive

CiteSeerX

An adaptive multiclass nearest neighbor classifier

Author: Puchkin Nikita
Spokoiny Vladimir
Publication venue: 'EDP Sciences'
Publication date: 03/11/2019
Field of study

We consider a problem of multiclass classification, where the training sample

S_n = \{(X_i, Y_i)\}_{i=1}^n

is generated from the model

\mathbb P(Y = m | X = x) = \eta_m(x)

1 \leq m \leq M

, and

\eta_1(x), \dots, \eta_M(x)

are unknown

\alpha

-Holder continuous functions.Given a test point

X

, our goal is to predict its label. A widely used

\mathsf k

-nearest-neighbors classifier constructs estimates of

\eta_1(X), \dots, \eta_M(X)

and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter

\mathsf k

, which may become tricky in some situations. In our solution, we fix several integers

n_1, \dots, n_K

, compute corresponding

n_k

-nearest-neighbor estimates for each

m

and each

n_k

and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter

\alpha

and to the margin and establish rates of convergence under mild assumptions.Comment: Accepted in ESAIM: Probability & Statistics. The original publication is available at www.esaim-ps.or

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Contributions on distance-based algorithms, multi-classifier construction and pairwise classification

Author: Mendialdua Beitia Iñigo
Publication venue
Publication date: 01/01/2015
Field of study

179 p.Aurkezten den ikerketa lan honetan saikapen atazak landu dira, non helburua,sailkapen gainbegiratuaren artearen-egoera aberastea izan den. Sailkapengainbegiratuaren zenbait estrategi analizatu dira, beraien ezaugarri etaahuleziak aztertuz. Beraz, ezaugarri positiboak mantenduz, ahuleziak hobetzekosaiakera egin da. Hau burutu ahal izateko, sailkapen gainbegiratuarenzenbait estrategi konbinatzeaz gain, zenbait bilaketa heuristiko ere erabili dira.Sailkapen gainbegiratuko 3 ikerketa lerro desberdinetan burutu dira ekarpenak.Aurkezten diren lehenengo proposamenak, K-NN algoritmoan zentratzendira, honen zenbait bertsio aurkezten direlarik. Ondoren sailkatzaileen konbinaketarekinerlazionatutako beste lan bat aurkezten da. Eta azkenik, binakakosailkapenaren zenbait estrategi berritzaile proposatzen dira. Ekarpenhauek aldizkari edo konferentzi internazionaletan publikatuak edo bidaliakizan dira.Buruturiko experimentuetan, proposatutako algoritmoak artearen-estatuanaurkituriko zenbait algoritmorekin konparatu dira, emaitza interesgarriak lortuaz.Honetaz gain, emaitza hauetatik ondorio esanguratsuak eskuratzeko asmoz,test estatistikoen erabilera ere burutu da

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

A study of hierarchical and flat classification of proteins

Author: Buchwald Fabian
Frank Eibe
Kramer Stefan
Zimek Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area

Research Commons@Waikato