1,109 research outputs found

    A General Framework for Fast Stagewise Algorithms

    Full text link
    Forward stagewise regression follows a very simple strategy for constructing a sequence of sparse regression estimates: it starts with all coefficients equal to zero, and iteratively updates the coefficient (by a small amount ϵ\epsilon) of the variable that achieves the maximal absolute inner product with the current residual. This procedure has an interesting connection to the lasso: under some conditions, it is known that the sequence of forward stagewise estimates exactly coincides with the lasso path, as the step size ϵ\epsilon goes to zero. Furthermore, essentially the same equivalence holds outside of least squares regression, with the minimization of a differentiable convex loss function subject to an ℓ1\ell_1 norm constraint (the stagewise algorithm now updates the coefficient corresponding to the maximal absolute component of the gradient). Even when they do not match their ℓ1\ell_1-constrained analogues, stagewise estimates provide a useful approximation, and are computationally appealing. Their success in sparse modeling motivates the question: can a simple, effective strategy like forward stagewise be applied more broadly in other regularization settings, beyond the ℓ1\ell_1 norm and sparsity? The current paper is an attempt to do just this. We present a general framework for stagewise estimation, which yields fast algorithms for problems such as group-structured learning, matrix completion, image denoising, and more.Comment: 56 pages, 15 figure

    Forward stagewise regression and the monotone lasso

    Full text link
    We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the least angle regression algorithm, with a small modification, solves the lasso regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it solves a version of the lasso problem that enforces monotonicity. One consequence of this is as follows: while lasso makes optimal progress in terms of reducing the residual sum-of-squares per unit increase in L1L_1-norm of the coefficient β\beta, forward stage-wise is optimal per unit L1L_1 arc-length traveled along the coefficient path. We also study a condition under which the coefficient paths of the lasso are monotone, and hence the different algorithms coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.Comment: Published at http://dx.doi.org/10.1214/07-EJS004 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

    Learning Dynamic Feature Selection for Fast Sequential Prediction

    Full text link
    We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning the features into a sequence of templates which are ordered such that high confidence can often be reached using only a small fraction of all features. Parameter estimation is arranged to maximize accuracy and early confidence in this sequence. Our approach is simpler and better suited to NLP than other related cascade methods. We present experiments in left-to-right part-of-speech tagging, named entity recognition, and transition-based dependency parsing. On the typical benchmarking datasets we can preserve POS tagging accuracy above 97% and parsing LAS above 88.5% both with over a five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase in speed.Comment: Appears in The 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China, July 201

    Semantic variation operators for multidimensional genetic programming

    Full text link
    Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during crossover. A forward stagewise crossover operator we propose leads to significant improvements on a set of regression problems, and produces state-of-the-art results in a large benchmark study. We discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. Finally, we look at the collinearity and complexity of the data representations that result from these architectures, with a view towards disentangling factors of variation in application.Comment: 9 pages, 8 figures, GECCO 201

    Spatial Weighting Matrix Selection in Spatial Lag Econometric Model

    Get PDF
    This paper investigates the choice of spatial weighting matrix in a spatial lag model framework. In the empirical literature the choice of spatial weighting matrix has been characterized by a great deal of arbitrariness. The number of possible spatial weighting matrices is large, which until recently was considered to prevent investigation into the appropriateness of the empirical choices. Recently Kostov (2010) proposed a new approach that transforms the problem into an equivalent variable selection problem. This article expands the latter transformation approach into a two-step selection procedure. The proposed approach aims at reducing the arbitrariness in the selection of spatial weighting matrix in spatial econometrics. This allows for a wide range of variable selection methods to be applied to the high dimensional problem of selection of spatial weighting matrix. The suggested approach consists of a screening step that reduces the number of candidate spatial weighting matrices followed by an estimation step selecting the final model. An empirical application of the proposed methodology is presented. In the latter a range of different combinations of screening and estimation methods are employed and found to produce similar results. The proposed methodology is shown to be able to approximate and provide indications to what the ‘true’ spatial weighting matrix could be even when it is not amongst the considered alternatives. The similarity in results obtained using different methods suggests that their relative computational costs could be primary reasons for their choice. Some further extensions and applications are also discussed
    • …
    corecore