15,486 research outputs found
Penalized Regression with Correlation Based Penalty
A new regularization method for regression models is proposed. The criterion to be minimized contains a penalty term which explicitly links strength of penalization to the correlation between predictors. As the elastic net, the method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. A boosted version of the penalized estimator, which is based on a new boosting method, allows to select variables. Real world data and simulations show that the method compares well to competing regularization techniques. In settings where the number of predictors is smaller than the number of observations it frequently performs better than competitors, in high dimensional settings prediction measures favor the elastic net while accuracy of estimation and stability of variable selection favors the newly proposed method
Applying Penalized Binary Logistic Regression with Correlation Based Elastic Net for Variables Selection
Reduction of the high dimensional classification using penalized logistic regression is one of the challenges in applying binary logistic regression. The applied penalized method, correlation based elastic penalty (CBEP), was used to overcome the limitation of LASSO and elastic net in variable selection when there are perfect correlation among explanatory variables. The performance of the CBEP was demonstrated through its application in analyzing two well-known high dimensional binary classification data sets. The CBEP provided superior classification performance and variable selection compared with other existing penalized methods. It is a reliable penalized method in binary logistic regression
Strong rules for nonconvex penalties and their implications for efficient algorithms in high-dimensional regression
We consider approaches for improving the efficiency of algorithms for fitting
nonconvex penalized regression models such as SCAD and MCP in high dimensions.
In particular, we develop rules for discarding variables during cyclic
coordinate descent. This dimension reduction leads to a substantial improvement
in the speed of these algorithms for high-dimensional problems. The rules we
propose here eliminate a substantial fraction of the variables from the
coordinate descent algorithm. Violations are quite rare, especially in the
locally convex region of the solution path, and furthermore, may be easily
detected and corrected by checking the Karush-Kuhn-Tucker conditions. We extend
these rules to generalized linear models, as well as to other nonconvex
penalties such as the -stabilized Mnet penalty, group MCP, and group
SCAD. We explore three variants of the coordinate decent algorithm that
incorporate these rules and study the efficiency of these algorithms in fitting
models to both simulated data and on real data from a genome-wide association
study
Pairwise Fused Lasso
In the last decade several estimators have been proposed that enforce the grouping property. A regularized estimate exhibits the grouping property if it selects groups of highly correlated predictor rather than selecting one representative. The pairwise fused lasso is related to fusion methods but does not assume that predictors have to be ordered. By penalizing parameters and differences between pairs of coefficients it selects predictors and enforces the grouping property. Two methods how to obtain estimates are given. The first is based on LARS and works for the linear model, the second is based on quadratic approximations and works in the more general case of generalized linear models. The method is evaluated in simulation studies and applied to real data sets
Combining Quadratic Penalization and Variable Selection via Forward Boosting
Quadratic penalties can be used to incorporate external knowledge about the association structure among regressors. Unfortunately, they do not enforce single estimated regression coefficients to equal zero. In this paper we propose a new approach to combine quadratic penalization and variable selection within the framework of generalized linear models. The new method is called Forward Boosting and is related to componentwise boosting techniques. We demonstrate in simulation studies and a real-world data example that the new approach competes well with existing alternatives especially when the focus is on interpretable structuring of predictors
Boosting Correlation Based Penalization in Generalized Linear Models
In high dimensional regression problems penalization techniques are a useful tool for estimation and variable selection. We
propose a novel penalization technique that aims at the grouping effect which encourages strongly correlated predictors to be in
or out of the model together. The proposed penalty uses the correlation between predictors explicitly. We consider a simple
version that does not select variables and a boosted version which is able to reduce the number of variables in the model. Both
methods are derived within the framework of generalized linear models. The performance is evaluated by simulations and by use of
real world data sets
Structured penalized regression for drug sensitivity prediction
Large-scale {\it in vitro} drug sensitivity screens are an important tool in
personalized oncology to predict the effectiveness of potential cancer drugs.
The prediction of the sensitivity of cancer cell lines to a panel of drugs is a
multivariate regression problem with high-dimensional heterogeneous multi-omics
data as input data and with potentially strong correlations between the outcome
variables which represent the sensitivity to the different drugs. We propose a
joint penalized regression approach with structured penalty terms which allow
us to utilize the correlation structure between drugs with group-lasso-type
penalties and at the same time address the heterogeneity between omics data
sources by introducing data-source-specific penalty factors to penalize
different data sources differently. By combining integrative penalty factors
(IPF) with tree-guided group lasso, we create the IPF-tree-lasso method. We
present a unified framework to transform more general IPF-type methods to the
original penalized method. Because the structured penalty terms have multiple
parameters, we demonstrate how the interval-search Efficient Parameter
Selection via Global Optimization (EPSGO) algorithm can be used to optimize
multiple penalty parameters efficiently. Simulation studies show that
IPF-tree-lasso can improve the prediction performance compared to other
lasso-type methods, in particular for heterogenous data sources. Finally, we
employ the new methods to analyse data from the Genomics of Drug Sensitivity in
Cancer project.Comment: Zhao Z, Zucknick M (2020). Structured penalized regression for drug
sensitivity prediction. Journal of the Royal Statistical Society, Series C.
19 pages, 6 figures and 2 table
- …