6,284 research outputs found
High-Dimensional Boosting: Rate of Convergence
Boosting is one of the most significant developments in machine learning.
This paper studies the rate of convergence of Boosting, which is tailored
for regression, in a high-dimensional setting. Moreover, we introduce so-called
\textquotedblleft post-Boosting\textquotedblright. This is a post-selection
estimator which applies ordinary least squares to the variables selected in the
first stage by Boosting. Another variant is \textquotedblleft Orthogonal
Boosting\textquotedblright\ where after each step an orthogonal projection is
conducted. We show that both post-Boosting and the orthogonal boosting
achieve the same rate of convergence as LASSO in a sparse, high-dimensional
setting. We show that the rate of convergence of the classical Boosting
depends on the design matrix described by a sparse eigenvalue constant. To show
the latter results, we derive new approximation results for the pure greedy
algorithm, based on analyzing the revisiting behavior of Boosting. We also
introduce feasible rules for early stopping, which can be easily implemented
and used in applied work. Our results also allow a direct comparison between
LASSO and boosting which has been missing from the literature. Finally, we
present simulation studies and applications to illustrate the relevance of our
theoretical results and to provide insights into the practical aspects of
boosting. In these simulation studies, post-Boosting clearly outperforms
LASSO.Comment: 19 pages, 4 tables; AMS 2000 subject classifications: Primary 62J05,
62J07, 41A25; secondary 49M15, 68Q3
NARX-based nonlinear system identification using orthogonal least squares basis hunting
An orthogonal least squares technique for basis hunting (OLS-BH) is proposed to construct sparse radial basis function (RBF) models for NARX-type nonlinear systems. Unlike most of the existing RBF or kernel modelling methods, whichplaces the RBF or kernel centers at the training input data points and use a fixed common variance for all the regressors, the proposed OLS-BH technique tunes the RBF center and diagonal covariance matrix of individual regressor by minimizing the training mean square error. An efficient optimization method isadopted for this basis hunting to select regressors in an orthogonal forward selection procedure. Experimental results obtained using this OLS-BH technique demonstrate that it offers a state-of-the-art method for constructing parsimonious RBF models with excellent generalization performance
Distributed Kernel Regression: An Algorithm for Training Collaboratively
This paper addresses the problem of distributed learning under communication
constraints, motivated by distributed signal processing in wireless sensor
networks and data mining with distributed databases. After formalizing a
general model for distributed learning, an algorithm for collaboratively
training regularized kernel least-squares regression estimators is derived.
Noting that the algorithm can be viewed as an application of successive
orthogonal projection algorithms, its convergence properties are investigated
and the statistical behavior of the estimator is discussed in a simplified
theoretical setting.Comment: To be presented at the 2006 IEEE Information Theory Workshop, Punta
del Este, Uruguay, March 13-17, 200
PLS dimension reduction for classification of microarray data
PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets
CLEAR: Covariant LEAst-square Re-fitting with applications to image restoration
In this paper, we propose a new framework to remove parts of the systematic
errors affecting popular restoration algorithms, with a special focus for image
processing tasks. Generalizing ideas that emerged for regularization,
we develop an approach re-fitting the results of standard methods towards the
input data. Total variation regularizations and non-local means are special
cases of interest. We identify important covariant information that should be
preserved by the re-fitting method, and emphasize the importance of preserving
the Jacobian (w.r.t. the observed signal) of the original estimator. Then, we
provide an approach that has a "twicing" flavor and allows re-fitting the
restored signal by adding back a local affine transformation of the residual
term. We illustrate the benefits of our method on numerical simulations for
image restoration tasks
Forward stagewise regression and the monotone lasso
We consider the least angle regression and forward stagewise algorithms for
solving penalized least squares regression problems. In Efron, Hastie,
Johnstone & Tibshirani (2004) it is proved that the least angle regression
algorithm, with a small modification, solves the lasso regression problem. Here
we give an analogous result for incremental forward stagewise regression,
showing that it solves a version of the lasso problem that enforces
monotonicity. One consequence of this is as follows: while lasso makes optimal
progress in terms of reducing the residual sum-of-squares per unit increase in
-norm of the coefficient , forward stage-wise is optimal per unit
arc-length traveled along the coefficient path. We also study a condition
under which the coefficient paths of the lasso are monotone, and hence the
different algorithms coincide. Finally, we compare the lasso and forward
stagewise procedures in a simulation study involving a large number of
correlated predictors.Comment: Published at http://dx.doi.org/10.1214/07-EJS004 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe
(FW) algorithms regained popularity in recent years due to their simplicity,
effectiveness and theoretical guarantees. MP and FW address optimization over
the linear span and the convex hull of a set of atoms, respectively. In this
paper, we consider the intermediate case of optimization over the convex cone,
parametrized as the conic hull of a generic atom set, leading to the first
principled definitions of non-negative MP algorithms for which we give explicit
convergence rates and demonstrate excellent empirical performance. In
particular, we derive sublinear () convergence on general
smooth and convex objectives, and linear convergence () on
strongly convex objectives, in both cases for general sets of atoms.
Furthermore, we establish a clear correspondence of our algorithms to known
algorithms from the MP and FW literature. Our novel algorithms and analyses
target general atom sets and general objective functions, and hence are
directly applicable to a large variety of learning settings.Comment: NIPS 201
- …