519 research outputs found

    GAP Safe screening rules for sparse multi-task and multi-class models

    Full text link
    High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be \emph{safe}. In this paper we derive new safe rules for generalized linear models regularized with 1\ell_1 and 1/2\ell_1/\ell_2 norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters. The GAP Safe rule can cope with any iterative solver and we illustrate its performance on coordinate descent for multi-task Lasso, binary and multinomial logistic regression, demonstrating significant speed ups on all tested datasets with respect to previous safe rules.Comment: in Proceedings of the 29-th Conference on Neural Information Processing Systems (NIPS), 201

    Predictive modeling using sparse logistic regression with applications

    Get PDF
    In this thesis, sparse logistic regression models are applied in a set of real world machine learning applications. The studied cases include supervised image segmentation, cancer diagnosis, and MEG data classification. Image segmentation is applied both in component detection in inkjet printed electronics manufacturing and in cell detection from microscope images. The results indicate that a simple linear classification method such as logistic regression often outperforms more sophisticated methods. Further, it is shown that the interpretability of the linear model offers great advantage in many applications. Model validation and automatic feature selection by means of L1 regularized parameter estimation have a significant role in this thesis. It is shown that a combination of a careful model assessment scheme and automatic feature selection by means of logistic regression model and coefficient regularization create a powerful, yet simple and practical, tool chain for applications of supervised learning and classification

    Predicting Alzheimer Disease Status Using High-Dimensional MRI Data Based on LASSO Constrained Generalized Linear Models

    Get PDF
    Introduction: Alzheimer’s disease is an irreversible brain disorder characterized by distortion of memory and other mental functions. Although, several psychometric tests are available for diagnosis of Alzheimer’s, there is a great concern about the validity of these tests at recognizing the early onset of the disease. Currently, brain magnetic resonance imaging is not commonly utilized in the diagnosis of Alzheimer’s, because researchers are still puzzled by the association of brain regions with the disease status and its progress. Moreover, MRI data tend to be of high dimensional nature requiring advanced statistical methods to accurately analyze them. In the past decade, the application of Least Absolute Shrinkage and Selection Operator (LASSO) has become increasingly popular in the analysis of high dimensional data. With LASSO, only a small number of the regression coefficients are believed to have a non-zero value, and therefore allowed to enter the model; other coefficients are while others are shrunk to zero. Aim: Determine the non-zero regression coefficients in models predicting patients’ classification (Normal, mild cognitive impairment (MCI), or Alzheimer’s) using both non-ordinal and ordinal LASSO. Methods: Pre-processed high dimensional MRI data of the Alzheimer’s Disease Neuroimaging Initiative was analyzed. Predictors of the following model were differentiated: Alzheimer’s vs. normal, Alzheimer’s vs. normal and MCI, Alzheimer’s and MCI vs. Normal. Cross-validation followed by ordinal LASSO was executed on these same sets of models. Results: Results were inconclusive. Two brain regions, frontal lobe and putamen, appeared more frequently in the models than any other region. Non-ordinal multinomial models performed better than ordinal multinomial models with higher accuracy, sensitivity, and specificity rates. It was determined that majority of the models were best suited to predict MCI status than the other two statues. Discussion: In future research, the other stages of the disease, different statistical analysis methods, such as elastic net, and larger samples sizes should be explored when using brain MRI for Alzheimer’s disease classification

    Regularized Models for Fitting Zero-Inflated and Zero-Truncated Count Data: A Comparative Analysis

    Get PDF
    Generalized Linear Models (GLMs) are widely recognized for their efficacy in fitting count data, superior to the Ordinary Least Squares (OLS) approach. The incapability of OLS to suitably handle count data can be attributed to its tendency to overfit. This study proposes the utilization of regularized models, specifically Ridge Regression and the Least Absolute Shrinkage and Selection Operator (LASSO), for fitting count data. These models are compared to frequentist and Bayesian models commonly used for count data fitting, such as the Dirichlet prior mixture of generalized linear mixed models and the discrete Weibull. The findings reveal Ridge Regression's superiority over all other models based on the Akaike Information Criterion (AIC). However, its performance diminishes when evaluated using the Bayesian Information Criterion (BIC), even though it still outperforms LASSO. The study thereby suggests the use of regularized regression models for fitting zero-inflated count data, as demonstrated with simulated data. Further, the appropriateness of regularized zero for zero-truncated count is exemplified using life data
    corecore