250 research outputs found
Low Complexity Regularization of Linear Inverse Problems
Inverse problems and regularization theory is a central theme in contemporary
signal processing, where the goal is to reconstruct an unknown signal from
partial indirect, and possibly noisy, measurements of it. A now standard method
for recovering the unknown signal is to solve a convex optimization problem
that enforces some prior knowledge about its structure. This has proved
efficient in many problems routinely encountered in imaging sciences,
statistics and machine learning. This chapter delivers a review of recent
advances in the field where the regularization prior promotes solutions
conforming to some notion of simplicity/low-complexity. These priors encompass
as popular examples sparsity and group sparsity (to capture the compressibility
of natural signals and images), total variation and analysis sparsity (to
promote piecewise regularity), and low-rank (as natural extension of sparsity
to matrix-valued data). Our aim is to provide a unified treatment of all these
regularizations under a single umbrella, namely the theory of partial
smoothness. This framework is very general and accommodates all low-complexity
regularizers just mentioned, as well as many others. Partial smoothness turns
out to be the canonical way to encode low-dimensional models that can be linear
spaces or more general smooth manifolds. This review is intended to serve as a
one stop shop toward the understanding of the theoretical properties of the
so-regularized solutions. It covers a large spectrum including: (i) recovery
guarantees and stability to noise, both in terms of -stability and
model (manifold) identification; (ii) sensitivity analysis to perturbations of
the parameters involved (in particular the observations), with applications to
unbiased risk estimation ; (iii) convergence properties of the forward-backward
proximal splitting scheme, that is particularly well suited to solve the
corresponding large-scale regularized optimization problem
On the Power of Preconditioning in Sparse Linear Regression
Sparse linear regression is a fundamental problem in high-dimensional
statistics, but strikingly little is known about how to efficiently solve it
without restrictive conditions on the design matrix. We consider the
(correlated) random design setting, where the covariates are independently
drawn from a multivariate Gaussian with ,
and seek estimators minimizing ,
where is the -sparse ground truth. Information theoretically, one can
achieve strong error bounds with samples for arbitrary
and ; however, no efficient algorithms are known to match these guarantees
even with samples, without further assumptions on or . As
far as hardness, computational lower bounds are only known with worst-case
design matrices. Random-design instances are known which are hard for the
Lasso, but these instances can generally be solved by Lasso after a simple
change-of-basis (i.e. preconditioning).
In this work, we give upper and lower bounds clarifying the power of
preconditioning in sparse linear regression. First, we show that the
preconditioned Lasso can solve a large class of sparse linear regression
problems nearly optimally: it succeeds whenever the dependency structure of the
covariates, in the sense of the Markov property, has low treewidth -- even if
is highly ill-conditioned. Second, we construct (for the first time)
random-design instances which are provably hard for an optimally preconditioned
Lasso. In fact, we complete our treewidth classification by proving that for
any treewidth- graph, there exists a Gaussian Markov Random Field on this
graph such that the preconditioned Lasso, with any choice of preconditioner,
requires samples to recover -sparse signals when
covariates are drawn from this model.Comment: 73 pages, 5 figure
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Learning with Structured Sparsity
This paper investigates a new learning formulation called structured
sparsity, which is a natural extension of the standard sparsity concept in
statistical learning and compressive sensing. By allowing arbitrary structures
on the feature set, this concept generalizes the group sparsity idea that has
become popular in recent years. A general theory is developed for learning with
structured sparsity, based on the notion of coding complexity associated with
the structure. It is shown that if the coding complexity of the target signal
is small, then one can achieve improved performance by using coding complexity
regularization methods, which generalize the standard sparse regularization.
Moreover, a structured greedy algorithm is proposed to efficiently solve the
structured sparsity problem. It is shown that the greedy algorithm
approximately solves the coding complexity optimization problem under
appropriate conditions. Experiments are included to demonstrate the advantage
of structured sparsity over standard sparsity on some real applications
High dimensional inference: structured sparse models and non-linear measurement channels
Thesis (Ph.D.)--Boston UniversityHigh dimensional inference is motivated by many real life problems such as medical diagnosis, security, and marketing. In statistical inference problems, n data samples are collected where each sample contains p attributes. High dimensional inference deals with problems in which the number of parameters, p, is larger than the sample size, n.
To hope for any consistent result within high dimensional framework, data is assumed to lie on a low dimensional manifold. This implies that only k « p parameters are required to characterize p feature variables. One way to impose such a low dimensional structure is a regularization based approach. In this approach, statistical inference problem is mapped to an optimization problem in which a regularizer term penalizes the deviation of the model from a specific structure. The choice of appropriate penalizing functions is often challenging. We explore three major problems that arise in the context of this approach.
First, we probe the reconstruction problem under sparse Poisson models. We are motivated by applications in explosive identification, and online marketing where the observations are the counts of a recurring event. We study the amplitude effect which distinguishes our problem from a conventional linear regression least squares problem. Motivated by applications in decentralized sensor networks and distributed multi-task learning, we study the effect of decentralization on high dimensional inference. Finally, we provide a general framework to study the impact of multiple structured models on performance of regularization based reconstruction methods. For each of the afore- mentioned scenarios, we propose an equivalent optimization problem and specify the conditions under which the optimization problem can be solved. Moreover, we mathematically analyze the performance of such recovery method in terms of reconstruction error, prediction error, probability of successful recovery, and sample complexity
Grouping Strategies and Thresholding for High Dimensional Linear Models
The estimation problem in a high regression model with structured sparsity is
investigated. An algorithm using a two steps block thresholding procedure
called GR-LOL is provided. Convergence rates are produced: they depend on
simple coherence-type indices of the Gram matrix -easily checkable on the data-
as well as sparsity assumptions of the model parameters measured by a
combination of within-blocks with between-blocks norms. The
simplicity of the coherence indicator suggests ways to optimize the rates of
convergence when the group structure is not naturally given by the problem and
is unknown. In such a case, an auto-driven procedure is provided to determine
the regressors groups (number and contents). An intensive practical study
compares our grouping methods with the standard LOL algorithm. We prove that
the grouping rarely deteriorates the results but can improve them very
significantly. GR-LOL is also compared with group-Lasso procedures and exhibits
a very encouraging behavior. The results are quite impressive, especially when
GR-LOL algorithm is combined with a grouping pre-processing
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
- …