Search CORE

20,176 research outputs found

Piecewise linear regularized solution paths

Author: Rosset Saharon
Zhu Ji
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

We consider the generic regularized optimization problem

\hat{\mathsf{\beta}}(\lambda)=\arg \min_{\beta}L({\sf{y}},X{\sf{\beta}})+\lambda J({\sf{\beta}})

. Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for the LASSO--that is, if

L

is squared error loss and

J(\beta)=\|\beta\|_1

is the

\ell_1

norm of

\beta

--the optimal coefficient path is piecewise linear, that is,

\partial \hat{\beta}(\lambda)/\partial \lambda

is piecewise constant. We derive a general characterization of the properties of (loss

L

, penalty

J

) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.Comment: Published at http://dx.doi.org/10.1214/009053606000001370 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Learning to Approximate a Bregman Divergence

Author: Castanon David
Kulis Brian
Saligrama Venkatesh
Siahkamari Ali
Xia Xide
Publication venue
Publication date: 01/01/2020
Field of study

Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We develop a formulation and algorithm for learning arbitrary Bregman divergences based on approximating their underlying convex generating function via a piecewise linear function. We provide theoretical approximation bounds using our parameterization and show that the generalization error

O_p(m^{-1/2})

for metric learning using our framework matches the known generalization error in the strictly less general Mahalanobis metric learning setting. We further demonstrate empirically that our method performs well in comparison to existing metric learning methods, particularly for clustering and ranking problems.Comment: 19 pages, 4 figure

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)