103 research outputs found
Conditioning of Random Block Subdictionaries with Applications to Block-Sparse Recovery and Regression
The linear model, in which a set of observations is assumed to be given by a
linear combination of columns of a matrix, has long been the mainstay of the
statistics and signal processing literature. One particular challenge for
inference under linear models is understanding the conditions on the dictionary
under which reliable inference is possible. This challenge has attracted
renewed attention in recent years since many modern inference problems deal
with the "underdetermined" setting, in which the number of observations is much
smaller than the number of columns in the dictionary. This paper makes several
contributions for this setting when the set of observations is given by a
linear combination of a small number of groups of columns of the dictionary,
termed the "block-sparse" case. First, it specifies conditions on the
dictionary under which most block subdictionaries are well conditioned. This
result is fundamentally different from prior work on block-sparse inference
because (i) it provides conditions that can be explicitly computed in
polynomial time, (ii) the given conditions translate into near-optimal scaling
of the number of columns of the block subdictionaries as a function of the
number of observations for a large class of dictionaries, and (iii) it suggests
that the spectral norm and the quadratic-mean block coherence of the dictionary
(rather than the worst-case coherences) fundamentally limit the scaling of
dimensions of the well-conditioned block subdictionaries. Second, this paper
investigates the problems of block-sparse recovery and block-sparse regression
in underdetermined settings. Near-optimal block-sparse recovery and regression
are possible for certain dictionaries as long as the dictionary satisfies
easily computable conditions and the coefficients describing the linear
combination of groups of columns can be modeled through a mild statistical
prior.Comment: 39 pages, 3 figures. A revised and expanded version of the paper
published in IEEE Transactions on Information Theory (DOI:
10.1109/TIT.2015.2429632); this revision includes corrections in the proofs
of some of the result
Multi-task additive models with shared transfer functions based on dictionary learning
Additive models form a widely popular class of regression models which
represent the relation between covariates and response variables as the sum of
low-dimensional transfer functions. Besides flexibility and accuracy, a key
benefit of these models is their interpretability: the transfer functions
provide visual means for inspecting the models and identifying domain-specific
relations between inputs and outputs. However, in large-scale problems
involving the prediction of many related tasks, learning independently additive
models results in a loss of model interpretability, and can cause overfitting
when training data is scarce. We introduce a novel multi-task learning approach
which provides a corpus of accurate and interpretable additive models for a
large number of related forecasting tasks. Our key idea is to share transfer
functions across models in order to reduce the model complexity and ease the
exploration of the corpus. We establish a connection with sparse dictionary
learning and propose a new efficient fitting algorithm which alternates between
sparse coding and transfer function updates. The former step is solved via an
extension of Orthogonal Matching Pursuit, whose properties are analyzed using a
novel recovery condition which extends existing results in the literature. The
latter step is addressed using a traditional dictionary update rule.
Experiments on real-world data demonstrate that our approach compares favorably
to baseline methods while yielding an interpretable corpus of models, revealing
structure among the individual tasks and being more robust when training data
is scarce. Our framework therefore extends the well-known benefits of additive
models to common regression settings possibly involving thousands of tasks
Computational Methods for Sparse Solution of Linear Inverse Problems
The goal of the sparse approximation problem is to approximate a target signal using a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major practical algorithms for sparse approximation. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, making these algorithms versatile and relevant to a plethora of applications
A Multiple Hypothesis Testing Approach to Low-Complexity Subspace Unmixing
Subspace-based signal processing traditionally focuses on problems involving
a few subspaces. Recently, a number of problems in different application areas
have emerged that involve a significantly larger number of subspaces relative
to the ambient dimension. It becomes imperative in such settings to first
identify a smaller set of active subspaces that contribute to the observation
before further processing can be carried out. This problem of identification of
a small set of active subspaces among a huge collection of subspaces from a
single (noisy) observation in the ambient space is termed subspace unmixing.
This paper formally poses the subspace unmixing problem under the parsimonious
subspace-sum (PS3) model, discusses connections of the PS3 model to problems in
wireless communications, hyperspectral imaging, high-dimensional statistics and
compressed sensing, and proposes a low-complexity algorithm, termed marginal
subspace detection (MSD), for subspace unmixing. The MSD algorithm turns the
subspace unmixing problem for the PS3 model into a multiple hypothesis testing
(MHT) problem and its analysis in the paper helps control the family-wise error
rate of this MHT problem at any level under two random
signal generation models. Some other highlights of the analysis of the MSD
algorithm include: (i) it is applicable to an arbitrary collection of subspaces
on the Grassmann manifold; (ii) it relies on properties of the collection of
subspaces that are computable in polynomial time; and () it allows for
linear scaling of the number of active subspaces as a function of the ambient
dimension. Finally, numerical results are presented in the paper to better
understand the performance of the MSD algorithm.Comment: Submitted for journal publication; 33 pages, 14 figure
Frame Coherence and Sparse Signal Processing
The sparse signal processing literature often uses random sensing matrices to
obtain performance guarantees. Unfortunately, in the real world, sensing
matrices do not always come from random processes. It is therefore desirable to
evaluate whether an arbitrary matrix, or frame, is suitable for sensing sparse
signals. To this end, the present paper investigates two parameters that
measure the coherence of a frame: worst-case and average coherence. We first
provide several examples of frames that have small spectral norm, worst-case
coherence, and average coherence. Next, we present a new lower bound on
worst-case coherence and compare it to the Welch bound. Later, we propose an
algorithm that decreases the average coherence of a frame without changing its
spectral norm or worst-case coherence. Finally, we use worst-case and average
coherence, as opposed to the Restricted Isometry Property, to garner
near-optimal probabilistic guarantees on both sparse signal detection and
reconstruction in the presence of noise. This contrasts with recent results
that only guarantee noiseless signal recovery from arbitrary frames, and which
further assume independence across the nonzero entries of the signal---in a
sense, requiring small average coherence replaces the need for such an
assumption
- β¦