Search CORE

24,976 research outputs found

An Exponential Lower Bound on the Complexity of Regularization Paths

Author: Gärtner Bernd
Jaggi Martin
Maria Clément
Publication venue
Publication date: 01/01/2012
Field of study

For a variety of regularized optimization problems in machine learning, algorithms computing the entire solution path have been developed recently. Most of these methods are quadratic programs that are parameterized by a single parameter, as for example the Support Vector Machine (SVM). Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal parameter much easier. It has been assumed that these piecewise linear solution paths have only linear complexity, i.e. linearly many bends. We prove that for the support vector machine this complexity can be exponential in the number of training points in the worst case. More strongly, we construct a single instance of n input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) = \Theta(2^d) many distinct subsets of support vectors occur as the regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Piecewise linear regularized solution paths

Author: Rosset Saharon
Zhu Ji
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

We consider the generic regularized optimization problem

\hat{\mathsf{\beta}}(\lambda)=\arg \min_{\beta}L({\sf{y}},X{\sf{\beta}})+\lambda J({\sf{\beta}})

. Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for the LASSO--that is, if

L

is squared error loss and

J(\beta)=\|\beta\|_1

is the

\ell_1

norm of

\beta

--the optimal coefficient path is piecewise linear, that is,

\partial \hat{\beta}(\lambda)/\partial \lambda

is piecewise constant. We derive a general characterization of the properties of (loss

L

, penalty

J

) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.Comment: Published at http://dx.doi.org/10.1214/009053606000001370 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Implicitly Constrained Semi-Supervised Least Squares Classification

Author: B Widrow
GJ McLachlan
K Nigam
KP Bennett
L Bottou
M Loog
M Loog
M Opper
O Chapelle
R Rifkin
R Tibshirani
RH Byrd
S Raudys
T Hastie
T Poggio
X Zhu
YF Li
Publication venue
Publication date: 24/07/2015
Field of study

We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate.Comment: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium on Intelligent Data Analysis (2015), Saint-Etienne, Franc

arXiv.org e-Print Archive

Crossref

Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

Author: Cooper Gregory F.
Naeini Mahdi Pakdaman
Publication venue
Publication date: 16/11/2015
Field of study

Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called \textit{ensemble of near isotonic regression} (ENIR). The method can be considered as an extension of BBQ, a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression. ENIR is designed to address the key limitation of isotonic regression which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be combined with many existing classification models. We demonstrate the performance of ENIR on synthetic and real datasets for the commonly used binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is

O(N \log N)

time, where

N

is the number of samples

arXiv.org e-Print Archive

Crossref

Hyperparameter optimization with approximate gradient

Author: Pedregosa Fabian
Publication venue
Publication date: 25/06/2016
Field of study

Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Finally, we validate the empirical performance of this method on the estimation of regularization constants of L2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.Comment: Proceedings of the International conference on Machine Learning (ICML

arXiv.org e-Print Archive