404 research outputs found
Maximum Margin Multiclass Nearest Neighbors
We develop a general framework for margin-based multicategory classification
in metric spaces. The basic work-horse is a margin-regularized version of the
nearest-neighbor classifier. We prove generalization bounds that match the
state of the art in sample size and significantly improve the dependence on
the number of classes . Our point of departure is a nearly Bayes-optimal
finite-sample risk bound independent of . Although -free, this bound is
unregularized and non-adaptive, which motivates our main result: Rademacher and
scale-sensitive margin bounds with a logarithmic dependence on . As the best
previous risk estimates in this setting were of order , our bound is
exponentially sharper. From the algorithmic standpoint, in doubling metric
spaces our classifier may be trained on examples in time and
evaluated on new points in time
An adaptive multiclass nearest neighbor classifier
We consider a problem of multiclass classification, where the training sample
is generated from the model , , and are
unknown -Holder continuous functions.Given a test point , our goal
is to predict its label. A widely used -nearest-neighbors classifier
constructs estimates of and uses a plug-in rule
for the prediction. However, it requires a proper choice of the smoothing
parameter , which may become tricky in some situations. In our
solution, we fix several integers , compute corresponding
-nearest-neighbor estimates for each and each and apply an
aggregation procedure. We study an algorithm, which constructs a convex
combination of these estimates such that the aggregated estimate behaves
approximately as well as an oracle choice. We also provide a non-asymptotic
analysis of the procedure, prove its adaptation to the unknown smoothness
parameter and to the margin and establish rates of convergence under
mild assumptions.Comment: Accepted in ESAIM: Probability & Statistics. The original publication
is available at www.esaim-ps.or
Contributions on distance-based algorithms, multi-classifier construction and pairwise classification
179 p.Aurkezten den ikerketa lan honetan saikapen atazak landu dira, non helburua,sailkapen gainbegiratuaren artearen-egoera aberastea izan den. Sailkapengainbegiratuaren zenbait estrategi analizatu dira, beraien ezaugarri etaahuleziak aztertuz. Beraz, ezaugarri positiboak mantenduz, ahuleziak hobetzekosaiakera egin da. Hau burutu ahal izateko, sailkapen gainbegiratuarenzenbait estrategi konbinatzeaz gain, zenbait bilaketa heuristiko ere erabili dira.Sailkapen gainbegiratuko 3 ikerketa lerro desberdinetan burutu dira ekarpenak.Aurkezten diren lehenengo proposamenak, K-NN algoritmoan zentratzendira, honen zenbait bertsio aurkezten direlarik. Ondoren sailkatzaileen konbinaketarekinerlazionatutako beste lan bat aurkezten da. Eta azkenik, binakakosailkapenaren zenbait estrategi berritzaile proposatzen dira. Ekarpenhauek aldizkari edo konferentzi internazionaletan publikatuak edo bidaliakizan dira.Buruturiko experimentuetan, proposatutako algoritmoak artearen-estatuanaurkituriko zenbait algoritmorekin konparatu dira, emaitza interesgarriak lortuaz.Honetaz gain, emaitza hauetatik ondorio esanguratsuak eskuratzeko asmoz,test estatistikoen erabilera ere burutu da
A study of hierarchical and flat classification of proteins
Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area
- …