12,394 research outputs found

    An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors

    Full text link
    In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. An active set algorithm that can efficiently compute such estimators is proposed, and a characterization of the solution is provided. Having an efficient algorithm at hand is especially relevant when applying likelihood ratio tests in restricted generalized linear models, where one needs the value of the likelihood at the restricted maximizer. The algorithm is illustrated on a real life data set from oncology.Comment: 24 pages, 1 Figure, 3 Table

    A Generic Path Algorithm for Regularized Statistical Estimation

    Full text link
    Regularization is widely used in statistics and machine learning to prevent overfitting and gear solution towards prior information. In general, a regularized estimation problem minimizes the sum of a loss function and a penalty term. The penalty term is usually weighted by a tuning parameter and encourages certain constraints on the parameters to be estimated. Particular choices of constraints lead to the popular lasso, fused-lasso, and other generalized l1l_1 penalized regression methods. Although there has been a lot of research in this area, developing efficient optimization methods for many nonseparable penalties remains a challenge. In this article we propose an exact path solver based on ordinary differential equations (EPSODE) that works for any convex loss function and can deal with generalized l1l_1 penalties as well as more complicated regularization such as inequality constraints encountered in shape-restricted regressions and nonparametric density estimation. In the path following process, the solution path hits, exits, and slides along the various constraints and vividly illustrates the tradeoffs between goodness of fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC, CpC_p or cross-validation to select an optimal tuning parameter. Our applications to generalized l1l_1 regularized generalized linear models, shape-restricted regressions, Gaussian graphical models, and nonparametric density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure

    Sparsity with sign-coherent groups of variables via the cooperative-Lasso

    Full text link
    We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either nonnegative, nonpositive or null parameters. To tackle this problem, we propose the cooperative-Lasso penalty. We derive the optimality conditions defining the cooperative-Lasso estimate for generalized linear models, and propose an efficient active set algorithm suited to high-dimensional problems. We study the asymptotic consistency of the estimator in the linear regression setup and derive its irrepresentable conditions, which are milder than the ones of the group-Lasso regarding the matching of groups with the sparsity pattern of the true parameters. We also address the problem of model selection in linear regression by deriving an approximation of the degrees of freedom of the cooperative-Lasso estimator. Simulations comparing the proposed estimator to the group and sparse group-Lasso comply with our theoretical results, showing consistent improvements in support recovery for sign-coherent groups. We finally propose two examples illustrating the wide applicability of the cooperative-Lasso: first to the processing of ordinal variables, where the penalty acts as a monotonicity prior; second to the processing of genomic data, where the set of differentially expressed probes is enriched by incorporating all the probes of the microarray that are related to the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An Ordered Lasso and Sparse Time-Lagged Regression

    Full text link
    We consider regression scenarios where it is natural to impose an order constraint on the coefficients. We propose an order-constrained version of L1-regularized regression for this problem, and show how to solve it efficiently using the well-known Pool Adjacent Violators Algorithm as its proximal operator. The main application of this idea is time-lagged regression, where we predict an outcome at time t from features at the previous K time points. In this setting it is natural to assume that the coefficients decay as we move farther away from t, and hence the order constraint is reasonable. Potential applications include financial time series and prediction of dynamic patient out- comes based on clinical measurements. We illustrate this idea on real and simulated data.Comment: 15 pages, 6 figure

    A Path Algorithm for Constrained Estimation

    Full text link
    Many least squares problems involve affine equality and inequality constraints. Although there are variety of methods for solving such problems, most statisticians find constrained estimation challenging. The current paper proposes a new path following algorithm for quadratic programming based on exact penalization. Similar penalties arise in l1l_1 regularization in model selection. Classical penalty methods solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to ∞\infty, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. The exact path following method starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. Path following in lasso penalized regression, in contrast, starts with a large value of the penalty constant and works its way downward. In both settings, inspection of the entire solution path is revealing. Just as with the lasso and generalized lasso, it is possible to plot the effective degrees of freedom along the solution path. For a strictly convex quadratic program, the exact penalty algorithm can be framed entirely in terms of the sweep operator of regression analysis. A few well chosen examples illustrate the mechanics and potential of path following.Comment: 26 pages, 5 figure
    • …
    corecore