11,723 research outputs found
GAP Safe screening rules for sparse multi-task and multi-class models
High dimensional regression benefits from sparsity promoting regularizations.
Screening rules leverage the known sparsity of the solution by ignoring some
variables in the optimization, hence speeding up solvers. When the procedure is
proven not to discard features wrongly the rules are said to be \emph{safe}. In
this paper we derive new safe rules for generalized linear models regularized
with and norms. The rules are based on duality gap
computations and spherical safe regions whose diameters converge to zero. This
allows to discard safely more variables, in particular for low regularization
parameters. The GAP Safe rule can cope with any iterative solver and we
illustrate its performance on coordinate descent for multi-task Lasso, binary
and multinomial logistic regression, demonstrating significant speed ups on all
tested datasets with respect to previous safe rules.Comment: in Proceedings of the 29-th Conference on Neural Information
Processing Systems (NIPS), 201
Screening Rules for Convex Problems
We propose a new framework for deriving screening rules for convex
optimization problems. Our approach covers a large class of constrained and
penalized optimization formulations, and works in two steps. First, given any
approximate point, the structure of the objective function and the duality gap
is used to gather information on the optimal solution. In the second step, this
information is used to produce screening rules, i.e. safely identifying
unimportant weight variables of the optimal solution. Our general framework
leads to a large variety of useful existing as well as new screening rules for
many applications. For example, we provide new screening rules for general
simplex and -constrained problems, Elastic Net, squared-loss Support
Vector Machines, minimum enclosing ball, as well as structured norm regularized
problems, such as group lasso
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
In high dimensional settings, sparse structures are crucial for efficiency,
both in term of memory, computation and performance. It is customary to
consider penalty to enforce sparsity in such scenarios. Sparsity
enforcing methods, the Lasso being a canonical example, are popular candidates
to address high dimension. For efficiency, they rely on tuning a parameter
trading data fitting versus sparsity. For the Lasso theory to hold this tuning
parameter should be proportional to the noise level, yet the latter is often
unknown in practice. A possible remedy is to jointly optimize over the
regression parameter as well as over the noise level. This has been considered
under several names in the literature: Scaled-Lasso, Square-root Lasso,
Concomitant Lasso estimation for instance, and could be of interest for
confidence sets or uncertainty quantification. In this work, after illustrating
numerical difficulties for the Smoothed Concomitant Lasso formulation, we
propose a modification we coined Smoothed Concomitant Lasso, aimed at
increasing numerical stability. We propose an efficient and accurate solver
leading to a computational cost no more expansive than the one for the Lasso.
We leverage on standard ingredients behind the success of fast Lasso solvers: a
coordinate descent algorithm, combined with safe screening rules to achieve
speed efficiency, by eliminating early irrelevant features
Structured Sparse Methods for Imaging Genetics
abstract: Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods.Dissertation/ThesisDoctoral Dissertation Computer Science 201
- …