Search CORE

9,282 research outputs found

Boosting Ridge Regression

Author: Binder Harald
Tutz Gerhard
Publication venue
Publication date: 01/01/2005
Field of study

Ridge regression is a well established method to shrink regression parameters towards zero, thereby securing existence of estimates. The present paper investigates several approaches to combining ridge regression with boosting techniques. In the direct approach the ridge estimator is used to fit iteratively the current residuals yielding an alternative to the usual ridge estimator. In partial boosting only part of the regression parameters are reestimated within one step of the iterative procedure. The technique allows to distinguish between variables that are always included in the analysis and variables that are chosen only if relevant. The resulting procedure selects variables in a similar way as the Lasso, yielding a reduced set of influential variables. The suggested procedures are investigated within the classical framework of continuous response variables as well as in the case of generalized linear models. In a simulation study boosting procedures for different stopping criteria are investigated and the performance in terms of prediction and the identification of relevant variables is compared to several competitors as the Lasso and the more recently proposed elastic net

Open Access LMU

ELM Ridge Regression Boosting

Author: Andrecut M.
Publication venue
Publication date: 24/10/2023
Field of study

We discuss a boosting approach for the Ridge Regression (RR) method, with applications to the Extreme Learning Machine (ELM), and we show that the proposed method significantly improves the classification performance and robustness of ELMs.Comment: 6 pages, 2 figure

arXiv.org e-Print Archive

Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression

Author: Gertheiss Jan
Tutz Gerhard
Publication venue
Publication date: 01/01/2007
Field of study

Main objectives of feature extraction in signal regression are the improvement of accuracy of prediction on future data and identification of relevant parts of the signal. A feature extraction procedure is proposed that uses boosting techniques to select the relevant parts of the signal. The proposed blockwise boosting procedure simultaneously selects intervals in the signal’s domain and estimates the effect on the response. The blocks that are defined explicitly use the underlying metric of the signal. It is demonstrated in simulation studies and for real-world data that the proposed approach competes well with procedures like PLS, P-spline signal regression and functional data regression. The paper is a preprint of an article published in the Journal of Computational and Graphical Statistics. Please use the journal version for citation

CiteSeerX

Open Access LMU

Geoadditive Regression Modeling of Stream Biological Condition

Author: Hothorn Torsten
Maloney K. O.
Potapov Sergej
Schmid Matthias
Weller D. E.
Publication venue
Publication date: 01/01/2010
Field of study

Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained

Open Access LMU

Predicting Aircraft Descent Length with Machine Learning

Author: Alligier Richard
Durand Nicolas
Gianazza David
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audiencePredicting aircraft trajectories is a key element in the detection and resolution of air traffic conflicts. In this paper, we focus on the ground-based prediction of final descents toward the destination airport. Several Machine Learning methods – ridge regression, neural networks, and gradient-boosting machine – are applied to the prediction of descents toward Toulouse airport (France), and compared with a baseline method relying on the Eurocontrol Base of Aircraft Data (BADA). Using a dataset of 15,802 Mode-S radar trajectories of 11 different aircraft types, we build models which predict the total descent length from the cruise altitude to a given final altitude. Our results show that the Machine Learning methods improve the root mean square error on the predicted descent length of at least 20 % for the ridge regression, and up to 24 % for the gradient-boosting machine, when compared with the baseline BADA method

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

A Framework for Unbiased Model Selection Based on Boosting

Author: Hofner Benjamin
Hothorn Torsten
Kneib Thomas
Schmid Matthias
Publication venue
Publication date: 10/12/2009
Field of study

Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models

Open Access LMU

Boosting for high-dimensional linear models

Author: Bühlmann Peter
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We prove that boosting with the squared error loss,

L_2

Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as

O

(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the

\ell_1

-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the

\ell_1

-norm. We also propose here an

\mathit{AIC}

-based method for tuning, namely for choosing the number of boosting iterations. This makes

L_2

Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate

L_2

Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.Comment: Published at http://dx.doi.org/10.1214/009053606000000092 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Repository for Publications and Research Data

CiteSeerX

Crossref

Penalized Regression with Correlation Based Penalty

Author: Tutz Gerhard
Ulbricht Jan
Publication venue
Publication date: 01/01/2006
Field of study

A new regularization method for regression models is proposed. The criterion to be minimized contains a penalty term which explicitly links strength of penalization to the correlation between predictors. As the elastic net, the method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. A boosted version of the penalized estimator, which is based on a new boosting method, allows to select variables. Real world data and simulations show that the method compares well to competing regularization techniques. In settings where the number of predictors is smaller than the number of observations it frequently performs better than competitors, in high dimensional settings prediction measures favor the elastic net while accuracy of estimation and stability of variable selection favors the newly proposed method

CiteSeerX

Open Access LMU