12,394 research outputs found
An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors
In biomedical studies, researchers are often interested in assessing the
association between one or more ordinal explanatory variables and an outcome
variable, at the same time adjusting for covariates of any type. The outcome
variable may be continuous, binary, or represent censored survival times. In
the absence of precise knowledge of the response function, using monotonicity
constraints on the ordinal variables improves efficiency in estimating
parameters, especially when sample sizes are small. An active set algorithm
that can efficiently compute such estimators is proposed, and a
characterization of the solution is provided. Having an efficient algorithm at
hand is especially relevant when applying likelihood ratio tests in restricted
generalized linear models, where one needs the value of the likelihood at the
restricted maximizer. The algorithm is illustrated on a real life data set from
oncology.Comment: 24 pages, 1 Figure, 3 Table
A Generic Path Algorithm for Regularized Statistical Estimation
Regularization is widely used in statistics and machine learning to prevent
overfitting and gear solution towards prior information. In general, a
regularized estimation problem minimizes the sum of a loss function and a
penalty term. The penalty term is usually weighted by a tuning parameter and
encourages certain constraints on the parameters to be estimated. Particular
choices of constraints lead to the popular lasso, fused-lasso, and other
generalized penalized regression methods. Although there has been a lot
of research in this area, developing efficient optimization methods for many
nonseparable penalties remains a challenge. In this article we propose an exact
path solver based on ordinary differential equations (EPSODE) that works for
any convex loss function and can deal with generalized penalties as well
as more complicated regularization such as inequality constraints encountered
in shape-restricted regressions and nonparametric density estimation. In the
path following process, the solution path hits, exits, and slides along the
various constraints and vividly illustrates the tradeoffs between goodness of
fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC,
or cross-validation to select an optimal tuning parameter. Our
applications to generalized regularized generalized linear models,
shape-restricted regressions, Gaussian graphical models, and nonparametric
density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
We consider the problems of estimation and selection of parameters endowed
with a known group structure, when the groups are assumed to be sign-coherent,
that is, gathering either nonnegative, nonpositive or null parameters. To
tackle this problem, we propose the cooperative-Lasso penalty. We derive the
optimality conditions defining the cooperative-Lasso estimate for generalized
linear models, and propose an efficient active set algorithm suited to
high-dimensional problems. We study the asymptotic consistency of the estimator
in the linear regression setup and derive its irrepresentable conditions, which
are milder than the ones of the group-Lasso regarding the matching of groups
with the sparsity pattern of the true parameters. We also address the problem
of model selection in linear regression by deriving an approximation of the
degrees of freedom of the cooperative-Lasso estimator. Simulations comparing
the proposed estimator to the group and sparse group-Lasso comply with our
theoretical results, showing consistent improvements in support recovery for
sign-coherent groups. We finally propose two examples illustrating the wide
applicability of the cooperative-Lasso: first to the processing of ordinal
variables, where the penalty acts as a monotonicity prior; second to the
processing of genomic data, where the set of differentially expressed probes is
enriched by incorporating all the probes of the microarray that are related to
the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An Ordered Lasso and Sparse Time-Lagged Regression
We consider regression scenarios where it is natural to impose an order
constraint on the coefficients. We propose an order-constrained version of
L1-regularized regression for this problem, and show how to solve it
efficiently using the well-known Pool Adjacent Violators Algorithm as its
proximal operator. The main application of this idea is time-lagged regression,
where we predict an outcome at time t from features at the previous K time
points. In this setting it is natural to assume that the coefficients decay as
we move farther away from t, and hence the order constraint is reasonable.
Potential applications include financial time series and prediction of dynamic
patient out- comes based on clinical measurements. We illustrate this idea on
real and simulated data.Comment: 15 pages, 6 figure
A Path Algorithm for Constrained Estimation
Many least squares problems involve affine equality and inequality
constraints. Although there are variety of methods for solving such problems,
most statisticians find constrained estimation challenging. The current paper
proposes a new path following algorithm for quadratic programming based on
exact penalization. Similar penalties arise in regularization in model
selection. Classical penalty methods solve a sequence of unconstrained problems
that put greater and greater stress on meeting the constraints. In the limit as
the penalty constant tends to , one recovers the constrained solution.
In the exact penalty method, squared penalties are replaced by absolute value
penalties, and the solution is recovered for a finite value of the penalty
constant. The exact path following method starts at the unconstrained solution
and follows the solution path as the penalty constant increases. In the
process, the solution path hits, slides along, and exits from the various
constraints. Path following in lasso penalized regression, in contrast, starts
with a large value of the penalty constant and works its way downward. In both
settings, inspection of the entire solution path is revealing. Just as with the
lasso and generalized lasso, it is possible to plot the effective degrees of
freedom along the solution path. For a strictly convex quadratic program, the
exact penalty algorithm can be framed entirely in terms of the sweep operator
of regression analysis. A few well chosen examples illustrate the mechanics and
potential of path following.Comment: 26 pages, 5 figure
- …