5,365 research outputs found
Smoothing Proximal Gradient Method for General Structured Sparse Learning
We study the problem of learning high dimensional regression models
regularized by a structured-sparsity-inducing penalty that encodes prior
structural information on either input or output sides. We consider two widely
adopted types of such penalties as our motivating examples: 1) overlapping
group lasso penalty, based on the l1/l2 mixed-norm penalty, and 2) graph-guided
fusion penalty. For both types of penalties, due to their non-separability,
developing an efficient optimization method has remained a challenging problem.
In this paper, we propose a general optimization approach, called smoothing
proximal gradient method, which can solve the structured sparse regression
problems with a smooth convex loss and a wide spectrum of
structured-sparsity-inducing penalties. Our approach is based on a general
smoothing technique of Nesterov. It achieves a convergence rate faster than the
standard first-order method, subgradient method, and is much more scalable than
the most widely used interior-point method. Numerical results are reported to
demonstrate the efficiency and scalability of the proposed method.Comment: arXiv admin note: substantial text overlap with arXiv:1005.471
Increasing stability and interpretability of gene expression signatures
Motivation : Molecular signatures for diagnosis or prognosis estimated from
large-scale gene expression data often lack robustness and stability, rendering
their biological interpretation challenging. Increasing the signature's
interpretability and stability across perturbations of a given dataset and, if
possible, across datasets, is urgently needed to ease the discovery of
important biological processes and, eventually, new drug targets. Results : We
propose a new method to construct signatures with increased stability and
easier interpretability. The method uses a gene network as side interpretation
and enforces a large connectivity among the genes in the signature, leading to
signatures typically made of genes clustered in a few subnetworks. It combines
the recently proposed graph Lasso procedure with a stability selection
procedure. We evaluate its relevance for the estimation of a prognostic
signature in breast cancer, and highlight in particular the increase in
interpretability and stability of the signature
Theoretical Properties of the Overlapping Groups Lasso
We present two sets of theoretical results on the grouped lasso with overlap
of Jacob, Obozinski and Vert (2009) in the linear regression setting. This
method allows for joint selection of predictors in sparse regression, allowing
for complex structured sparsity over the predictors encoded as a set of groups.
This flexible framework suggests that arbitrarily complex structures can be
encoded with an intricate set of groups. Our results show that this strategy
results in unexpected theoretical consequences for the procedure. In
particular, we give two sets of results: (1) finite sample bounds on prediction
and estimation, and (2) asymptotic distribution and selection. Both sets of
results give insight into the consequences of choosing an increasingly complex
set of groups for the procedure, as well as what happens when the set of groups
cannot recover the true sparsity pattern. Additionally, these results
demonstrate the differences and similarities between the the grouped lasso
procedure with and without overlapping groups. Our analysis shows the set of
groups must be chosen with caution - an overly complex set of groups will
damage the analysis.Comment: 20 pages, submitted to Annals of Statistic
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Novel Joint Brain Network Analysis Using Longitudinal Alzheimer's Disease Data.
There is well-documented evidence of brain network differences between individuals with Alzheimer's disease (AD) and healthy controls (HC). To date, imaging studies investigating brain networks in these populations have typically been cross-sectional, and the reproducibility of such findings is somewhat unclear. In a novel study, we use the longitudinal ADNI data on the whole brain to jointly compute the brain network at baseline and one-year using a state of the art approach that pools information across both time points to yield distinct visit-specific networks for the AD and HC cohorts, resulting in more accurate inferences. We perform a multiscale comparison of the AD and HC networks in terms of global network metrics as well as at the more granular level of resting state networks defined under a whole brain parcellation. Our analysis illustrates a decrease in small-worldedness in the AD group at both the time points and also identifies more local network features and hub nodes that are disrupted due to the progression of AD. We also obtain high reproducibility of the HC network across visits. On the other hand, a separate estimation of the networks at each visit using standard graphical approaches reveals fewer meaningful differences and lower reproducibility
Efficient First Order Methods for Linear Composite Regularizers
A wide class of regularization problems in machine learning and statistics
employ a regularization term which is obtained by composing a simple convex
function \omega with a linear transformation. This setting includes Group Lasso
methods, the Fused Lasso and other total variation methods, multi-task learning
methods and many more. In this paper, we present a general approach for
computing the proximity operator of this class of regularizers, under the
assumption that the proximity operator of the function \omega is known in
advance. Our approach builds on a recent line of research on optimal first
order optimization methods and uses fixed point iterations for numerically
computing the proximity operator. It is more general than current approaches
and, as we show with numerical simulations, computationally more efficient than
available first order methods which do not achieve the optimal rate. In
particular, our method outperforms state of the art O(1/T) methods for
overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused
Lasso and tree structured Group Lasso.Comment: 19 pages, 8 figure
Stable Feature Selection from Brain sMRI
Neuroimage analysis usually involves learning thousands or even millions of
variables using only a limited number of samples. In this regard, sparse
models, e.g. the lasso, are applied to select the optimal features and achieve
high diagnosis accuracy. The lasso, however, usually results in independent
unstable features. Stability, a manifest of reproducibility of statistical
results subject to reasonable perturbations to data and the model, is an
important focus in statistics, especially in the analysis of high dimensional
data. In this paper, we explore a nonnegative generalized fused lasso model for
stable feature selection in the diagnosis of Alzheimer's disease. In addition
to sparsity, our model incorporates two important pathological priors: the
spatial cohesion of lesion voxels and the positive correlation between the
features and the disease labels. To optimize the model, we propose an efficient
algorithm by proving a novel link between total variation and fast network flow
algorithms via conic duality. Experiments show that the proposed nonnegative
model performs much better in exploring the intrinsic structure of data via
selecting stable features compared with other state-of-the-arts
- …