264 research outputs found
Low Complexity Regularization of Linear Inverse Problems
Inverse problems and regularization theory is a central theme in contemporary
signal processing, where the goal is to reconstruct an unknown signal from
partial indirect, and possibly noisy, measurements of it. A now standard method
for recovering the unknown signal is to solve a convex optimization problem
that enforces some prior knowledge about its structure. This has proved
efficient in many problems routinely encountered in imaging sciences,
statistics and machine learning. This chapter delivers a review of recent
advances in the field where the regularization prior promotes solutions
conforming to some notion of simplicity/low-complexity. These priors encompass
as popular examples sparsity and group sparsity (to capture the compressibility
of natural signals and images), total variation and analysis sparsity (to
promote piecewise regularity), and low-rank (as natural extension of sparsity
to matrix-valued data). Our aim is to provide a unified treatment of all these
regularizations under a single umbrella, namely the theory of partial
smoothness. This framework is very general and accommodates all low-complexity
regularizers just mentioned, as well as many others. Partial smoothness turns
out to be the canonical way to encode low-dimensional models that can be linear
spaces or more general smooth manifolds. This review is intended to serve as a
one stop shop toward the understanding of the theoretical properties of the
so-regularized solutions. It covers a large spectrum including: (i) recovery
guarantees and stability to noise, both in terms of -stability and
model (manifold) identification; (ii) sensitivity analysis to perturbations of
the parameters involved (in particular the observations), with applications to
unbiased risk estimation ; (iii) convergence properties of the forward-backward
proximal splitting scheme, that is particularly well suited to solve the
corresponding large-scale regularized optimization problem
Consistency of random forests
Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45
(2001) 5--32] that combines several randomized decision trees and aggregates
their predictions by averaging. Despite its wide usage and outstanding
practical performance, little is known about the mathematical properties of the
procedure. This disparity between theory and practice originates in the
difficulty to simultaneously analyze both the randomization process and the
highly data-dependent tree structure. In the present paper, we take a step
forward in forest exploration by proving a consistency result for Breiman's
[Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive
regression models. Our analysis also sheds an interesting light on how random
forests can nicely adapt to sparsity. 1. Introduction. Random forests are an
ensemble learning method for classification and regression that constructs a
number of randomized decision trees during the training phase and predicts by
averaging the results. Since its publication in the seminal paper of Breiman
(2001), the procedure has become a major data analysis tool, that performs well
in practice in comparison with many standard methods. What has greatly
contributed to the popularity of forests is the fact that they can be applied
to a wide range of prediction problems and have few parameters to tune. Aside
from being simple to use, the method is generally recognized for its accuracy
and its ability to deal with small sample sizes, high-dimensional feature
spaces and complex data structures. The random forest methodology has been
successfully involved in many practical problems, including air quality
prediction (winning code of the EMC data science global hackathon in 2012, see
http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al.
(2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3
Fast global convergence of gradient methods for high-dimensional statistical recovery
Many statistical -estimators are based on convex optimization problems
formed by the combination of a data-dependent loss function with a norm-based
regularizer. We analyze the convergence rates of projected gradient and
composite gradient methods for solving such problems, working within a
high-dimensional framework that allows the data dimension \pdim to grow with
(and possibly exceed) the sample size \numobs. This high-dimensional
structure precludes the usual global assumptions---namely, strong convexity and
smoothness conditions---that underlie much of classical optimization analysis.
We define appropriately restricted versions of these conditions, and show that
they are satisfied with high probability for various statistical models. Under
these conditions, our theory guarantees that projected gradient descent has a
globally geometric rate of convergence up to the \emph{statistical precision}
of the model, meaning the typical distance between the true unknown parameter
and an optimal solution . This result is substantially
sharper than previous convergence results, which yielded sublinear convergence,
or linear convergence only up to the noise level. Our analysis applies to a
wide range of -estimators and statistical models, including sparse linear
regression using Lasso (-regularized regression); group Lasso for block
sparsity; log-linear models with regularization; low-rank matrix recovery using
nuclear norm regularization; and matrix decomposition. Overall, our analysis
reveals interesting connections between statistical precision and computational
efficiency in high-dimensional estimation
Recommended from our members
Approximate Bayesian Deep Learning for Resource-Constrained Environments
Deep learning models have shown promising results in areas including computer vision, natural language processing, speech recognition, and more. However, existing point estimation-based training methods for these models may result in predictive uncertainties that are not well calibrated, including the occurrence of confident errors. Approximate Bayesian inference methods can help address these issues in a principled way by accounting for uncertainty in model parameters. However, these methods are computationally expensive both when computing approximations to the parameter posterior and when using an approximate parameter posterior to make predictions. They can also require significantly more storage than point-estimated models.
In this thesis, we address a range of questions related to trade-offs between the quality of inference and prediction and the computational scalability of Bayesian deep learning methods. We begin by developing a framework for comprehensive evaluation of Bayesian neural network models and applying this framework to a range of existing models and inference methods. Second, we address the problem of providing flexible trade-offs between prediction quality, run time, and storage by developing and evaluating a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier. Third, we investigate the trade-offs between model sparsity and inference performance for deep neural network models using several approaches to deriving sparse model structures. Fourth, we present a framework for correcting approximate posterior predictive distributions, encouraging them to prefer high-utility decisions. Finally, we investigate the use of approximate Bayesian deep learning in object detection and present an evaluation of approaches for quantifying different facets of uncertainty related to object classes and locations
- …