27,501 research outputs found
Theoretical Properties of the Overlapping Groups Lasso
We present two sets of theoretical results on the grouped lasso with overlap
of Jacob, Obozinski and Vert (2009) in the linear regression setting. This
method allows for joint selection of predictors in sparse regression, allowing
for complex structured sparsity over the predictors encoded as a set of groups.
This flexible framework suggests that arbitrarily complex structures can be
encoded with an intricate set of groups. Our results show that this strategy
results in unexpected theoretical consequences for the procedure. In
particular, we give two sets of results: (1) finite sample bounds on prediction
and estimation, and (2) asymptotic distribution and selection. Both sets of
results give insight into the consequences of choosing an increasingly complex
set of groups for the procedure, as well as what happens when the set of groups
cannot recover the true sparsity pattern. Additionally, these results
demonstrate the differences and similarities between the the grouped lasso
procedure with and without overlapping groups. Our analysis shows the set of
groups must be chosen with caution - an overly complex set of groups will
damage the analysis.Comment: 20 pages, submitted to Annals of Statistic
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A CONSTRAINED MATCHING PURSUIT APPROACH TO AUDIO DECLIPPING
© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Distributing the Kalman Filter for Large-Scale Systems
This paper derives a \emph{distributed} Kalman filter to estimate a sparsely
connected, large-scale, dimensional, dynamical system monitored by a
network of sensors. Local Kalman filters are implemented on the
(dimensional, where ) sub-systems that are obtained after
spatially decomposing the large-scale system. The resulting sub-systems
overlap, which along with an assimilation procedure on the local Kalman
filters, preserve an th order Gauss-Markovian structure of the centralized
error processes. The information loss due to the th order Gauss-Markovian
approximation is controllable as it can be characterized by a divergence that
decreases as . The order of the approximation, , leads to a lower
bound on the dimension of the sub-systems, hence, providing a criterion for
sub-system selection. The assimilation procedure is carried out on the local
error covariances with a distributed iterate collapse inversion (DICI)
algorithm that we introduce. The DICI algorithm computes the (approximated)
centralized Riccati and Lyapunov equations iteratively with only local
communication and low-order computation. We fuse the observations that are
common among the local Kalman filters using bipartite fusion graphs and
consensus averaging algorithms. The proposed algorithm achieves full
distribution of the Kalman filter that is coherent with the centralized Kalman
filter with an th order Gaussian-Markovian structure on the centralized
error processes. Nowhere storage, communication, or computation of
dimensional vectors and matrices is needed; only dimensional
vectors and matrices are communicated or used in the computation at the
sensors
High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression
Motivation: The high dimensionality of genomic data calls for the development
of specific classification methodologies, especially to prevent over-optimistic
predictions. This challenge can be tackled by compression and variable
selection, which combined constitute a powerful framework for classification,
as well as data visualization and interpretation. However, current proposed
combinations lead to instable and non convergent methods due to inappropriate
computational frameworks. We hereby propose a stable and convergent approach
for classification in high dimensional based on sparse Partial Least Squares
(sparse PLS). Results: We start by proposing a new solution for the sparse PLS
problem that is based on proximal operators for the case of univariate
responses. Then we develop an adaptive version of the sparse PLS for
classification, which combines iterative optimization of logistic regression
and sparse PLS to ensure convergence and stability. Our results are confirmed
on synthetic and experimental data. In particular we show how crucial
convergence and stability can be when cross-validation is involved for
calibration purposes. Using gene expression data we explore the prediction of
breast cancer relapse. We also propose a multicategorial version of our method
on the prediction of cell-types based on single-cell expression data.
Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3
figures, 10 table
- âŠ