Search CORE

10 research outputs found

High-Dimensional $L_2$ Boosting: Rate of Convergence

Author: Luo Ye
Spindler Martin
Publication venue
Publication date: 05/11/2016
Field of study

Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of

L_2

Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by

L_2

Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-

L_2

Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical

L_2

Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of

L_2

Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-

L_2

Boosting clearly outperforms LASSO.Comment: 19 pages, 4 tables; AMS 2000 subject classifications: Primary 62J05, 62J07, 41A25; secondary 49M15, 68Q3

arXiv.org e-Print Archive

The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

Author: Mazumder Rahul
Radchenko Peter
Publication venue
Publication date: 01/01/2017
Field of study

We propose a novel high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients subject to a budget on the maximal absolute correlation between the features and residuals. Motivated by the significant advances in integer optimization over the past 10-15 years, we present a Mixed Integer Linear Optimization (MILO) approach to obtain certifiably optimal global solutions to this nonconvex optimization problem. The current state of algorithmics in integer optimization makes our proposal substantially more computationally attractive than the least squares subset selection framework based on integer quadratic optimization, recently proposed in [8] and the continuous nonconvex quadratic optimization framework of [33]. We propose new discrete first-order methods, which when paired with state-of-the-art MILO solvers, lead to good solutions for the Discrete Dantzig Selector problem for a given computational budget. We illustrate that our integrated approach provides globally optimal solutions in significantly shorter computation times, when compared to off-the-shelf MILO solvers. We demonstrate both theoretically and empirically that in a wide range of regimes the statistical properties of the Discrete Dantzig Selector are superior to those of popular

\ell_{1}

-based approaches. We illustrate that our approach can handle problem instances with p = 10,000 features with certifiable optimality making it a highly scalable combinatorial variable selection approach in sparse linear modeling

arXiv.org e-Print Archive

Crossref

The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

Author: Mazumder Rahul
Radchenko Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/02/2019
Field of study

We propose a novel high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients subject to a budget on the maximal absolute correlation between the features and residuals. Motivated by the significant advances in integer optimization over the past 10-15 years, we present a mixed integer linear optimization (MILO) approach to obtain certifiably optimal global solutions to this nonconvex optimization problem. The current state of algorithmics in integer optimization makes our proposal substantially more computationally attractive than the least squares subset selection framework based on integer quadratic optimization, recently proposed by Bertsimas et al. and the continuous nonconvex quadratic optimization framework of Liu et al. We propose new discrete first-order methods, which when paired with the state-of-the-art MILO solvers, lead to good solutions for the Discrete Dantzig Selector problem for a given computational budget. We illustrate that our integrated approach provides globally optimal solutions in significantly shorter computation times, when compared to off-the-shelf MILO solvers. We demonstrate both theoretically and empirically that in a wide range of regimes the statistical properties of the Discrete Dantzig Selector are superior to those of popular ell1-based approaches. We illustrate that our approach can handle problem instances with p =10,000 features with certifiable optimality making it a highly scalable combinatorial variable selection approach in sparse linear modeling

DSpace@MIT

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum- $\ell_1$ -Norm Interpolated Classifiers

Author: Liang Tengyuan
Sur Pragya
Publication venue
Publication date: 22/07/2021
Field of study

This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider a high-dimensional setting where the number of features (weak learners)

p

scales with the sample size

n

, in an overparametrized regime. Under a class of statistical models, we provide an exact analysis of the generalization error of boosting when the algorithm interpolates the training data and maximizes the empirical

\ell_1

-margin. Further, we explicitly pin down the relation between the boosting test error and the optimal Bayes error, as well as the proportion of active features at interpolation (with zero initialization). In turn, these precise characterizations answer certain questions raised in \cite{breiman1999prediction, schapire1998boosting} surrounding boosting, under assumed data generating processes. At the heart of our theory lies an in-depth study of the maximum-

\ell_1

-margin, which can be accurately described by a new system of non-linear equations; to analyze this margin, we rely on Gaussian comparison techniques and develop a novel uniform deviation argument. Our statistical and computational arguments can handle (1) any finite-rank spiked covariance model for the feature distribution and (2) variants of boosting corresponding to general

\ell_q

-geometry,

q \in [1, 2]

. As a final component, via the Lindeberg principle, we establish a universality result showcasing that the scaled

\ell_1

-margin (asymptotically) remains the same, whether the covariates used for boosting arise from a non-linear random feature model or an appropriately linearized model with matching moments.Comment: 68 pages, 4 figure

arXiv.org e-Print Archive

A new perspective on boosting in linear regression via subgradient optimization and relatives

Author: Freund Robert Michael
Grigas Paul
Grigas Paul Edward
M. Freund Robert
Mazumder Rahul
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/05/2018
Field of study

We analyze boosting algorithms [Ann. Statist. 29 (2001) 1189–1232; Ann. Statist. 28 (2000) 337–407; Ann. Statist. 32 (2004) 407–499] in linear regression from a new perspective: that of modern first-order methods in convex optimiz ation. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS ? ) and least squares boosting [LS-BOOST(?)], can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a minor modification of FS ? that yields an algorithm for the LASSO, and that may be easily extended to an algorithm that computes the LASSO path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the LASSO may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-BOOST(?) and FS ? ) by using techniques of first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular, they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset

DSpace@MIT

A new perspective on boosting in linear regression via subgradient optimization and relatives

Author
Publication venue: 'Institute of Mathematical Statistics'
Publication date
Field of study

Crossref