6,284 research outputs found

    High-Dimensional L2L_2Boosting: Rate of Convergence

    Full text link
    Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of L2L_2Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by L2L_2Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-L2L_2Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical L2L_2Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of L2L_2Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-L2L_2Boosting clearly outperforms LASSO.Comment: 19 pages, 4 tables; AMS 2000 subject classifications: Primary 62J05, 62J07, 41A25; secondary 49M15, 68Q3

    NARX-based nonlinear system identification using orthogonal least squares basis hunting

    No full text
    An orthogonal least squares technique for basis hunting (OLS-BH) is proposed to construct sparse radial basis function (RBF) models for NARX-type nonlinear systems. Unlike most of the existing RBF or kernel modelling methods, whichplaces the RBF or kernel centers at the training input data points and use a fixed common variance for all the regressors, the proposed OLS-BH technique tunes the RBF center and diagonal covariance matrix of individual regressor by minimizing the training mean square error. An efficient optimization method isadopted for this basis hunting to select regressors in an orthogonal forward selection procedure. Experimental results obtained using this OLS-BH technique demonstrate that it offers a state-of-the-art method for constructing parsimonious RBF models with excellent generalization performance

    Distributed Kernel Regression: An Algorithm for Training Collaboratively

    Full text link
    This paper addresses the problem of distributed learning under communication constraints, motivated by distributed signal processing in wireless sensor networks and data mining with distributed databases. After formalizing a general model for distributed learning, an algorithm for collaboratively training regularized kernel least-squares regression estimators is derived. Noting that the algorithm can be viewed as an application of successive orthogonal projection algorithms, its convergence properties are investigated and the statistical behavior of the estimator is discussed in a simplified theoretical setting.Comment: To be presented at the 2006 IEEE Information Theory Workshop, Punta del Este, Uruguay, March 13-17, 200

    PLS dimension reduction for classification of microarray data

    Get PDF
    PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

    CLEAR: Covariant LEAst-square Re-fitting with applications to image restoration

    Full text link
    In this paper, we propose a new framework to remove parts of the systematic errors affecting popular restoration algorithms, with a special focus for image processing tasks. Generalizing ideas that emerged for 1\ell_1 regularization, we develop an approach re-fitting the results of standard methods towards the input data. Total variation regularizations and non-local means are special cases of interest. We identify important covariant information that should be preserved by the re-fitting method, and emphasize the importance of preserving the Jacobian (w.r.t. the observed signal) of the original estimator. Then, we provide an approach that has a "twicing" flavor and allows re-fitting the restored signal by adding back a local affine transformation of the residual term. We illustrate the benefits of our method on numerical simulations for image restoration tasks

    Forward stagewise regression and the monotone lasso

    Full text link
    We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the least angle regression algorithm, with a small modification, solves the lasso regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it solves a version of the lasso problem that enforces monotonicity. One consequence of this is as follows: while lasso makes optimal progress in terms of reducing the residual sum-of-squares per unit increase in L1L_1-norm of the coefficient β\beta, forward stage-wise is optimal per unit L1L_1 arc-length traveled along the coefficient path. We also study a condition under which the coefficient paths of the lasso are monotone, and hence the different algorithms coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.Comment: Published at http://dx.doi.org/10.1214/07-EJS004 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

    Full text link
    Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)\mathcal{O}(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(et)\mathcal{O}(e^{-t})) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.Comment: NIPS 201
    corecore