Search CORE

5,274 research outputs found

Parametric Local Metric Learning for Nearest Neighbor Classification

Author: Kalousis Alexandros
Wang Jun
Woznica Adam
Publication venue
Publication date: 13/09/2012
Field of study

We study the problem of learning local metrics for nearest neighbor classification. Most previous works on local metric learning learn a number of local unrelated metrics. While this "independence" approach delivers an increased flexibility its downside is the considerable risk of overfitting. We present a new parametric local metric learning method in which we learn a smooth metric matrix function over the data manifold. Using an approximation error bound of the metric matrix function we learn local metrics as linear combinations of basis metrics defined on anchor points over different regions of the instance space. We constrain the metric matrix function by imposing on the linear combinations manifold regularization which makes the learned metric matrix function vary smoothly along the geodesics of the data manifold. Our metric learning method has excellent performance both in terms of predictive power and scalability. We experimented with several large-scale classification problems, tens of thousands of instances, and compared it with several state of the art metric learning methods, both global and local, as well as to SVM with automatic kernel selection, all of which it outperforms in a significant manner

arXiv.org e-Print Archive

CiteSeerX

Training Support Vector Machines Using Frank-Wolfe Optimization Methods

Author: Frandi Emanuele
Gasparo Maria Grazia
Lodi Stefano
Nanculef Ricardo
Sartori Claudio
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 04/12/2012
Field of study

Training a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions. By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function. In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging

Author: Juditsky Anatoli
Nazin Alexander
Tsybakov Alexandre
Vayatis Nicolas
Publication venue
Publication date: 07/03/2006
Field of study

We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order

\sqrt{(\log M)/t}

with an explicit and small constant factor, where

M

is the dimension of the problem and

t

stands for the sample size. A similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss.Comment: 29 pages; mai 200

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Hal-Diderot