13 research outputs found
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Deep learning has been wildly successful in practice and most
state-of-the-art machine learning methods are based on neural networks.
Lacking, however, is a rigorous mathematical theory that adequately explains
the amazing performance of deep neural networks. In this article, we present a
relatively new mathematical framework that provides the beginning of a deeper
understanding of deep learning. This framework precisely characterizes the
functional properties of neural networks that are trained to fit to data. The
key mathematical tools which support this framework include transform-domain
sparse regularization, the Radon transform of computed tomography, and
approximation theory, which are all techniques deeply rooted in signal
processing. This framework explains the effect of weight decay regularization
in neural network training, the use of skip connections and low-rank weight
matrices in network architectures, the role of sparsity in neural networks, and
explains why neural networks can perform well in high-dimensional problems
On the Uniqueness of Inverse Problems with Fourier-domain Measurements and Generalized TV Regularization
We study the super-resolution problem of recovering a periodic
continuous-domain function from its low-frequency information. This means that
we only have access to possibly corrupted versions of its Fourier samples up to
a maximum cut-off frequency. The reconstruction task is specified as an
optimization problem with generalized total-variation regularization involving
a pseudo-differential operator. Our special emphasis is on the uniqueness of
solutions. We show that, for elliptic regularization operators (e.g., the
derivatives of any order), uniqueness is always guaranteed. To achieve this
goal, we provide a new analysis of constrained optimization problems over Radon
measures. We demonstrate that either the solutions are always made of Radon
measures of constant sign, or the solution is unique. Doing so, we identify a
general sufficient condition for the uniqueness of the solution of a
constrained optimization problem with TV-regularization, expressed in terms of
the Fourier samples.Comment: 20 page
On the Prediction Performance of the Lasso
Although the Lasso has been extensively studied, the relationship between its
prediction performance and the correlations of the covariates is not fully
understood. In this paper, we give new insights into this relationship in the
context of multiple linear regression. We show, in particular, that the
incorporation of a simple correlation measure into the tuning parameter can
lead to a nearly optimal prediction performance of the Lasso even for highly
correlated covariates. However, we also reveal that for moderately correlated
covariates, the prediction performance of the Lasso can be mediocre
irrespective of the choice of the tuning parameter. We finally show that our
results also lead to near-optimal rates for the least-squares estimator with
total variation penalty
Structured sparsity with convex penalty functions
We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in Machine Learning, Statistics and Signal Processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be “relaxed” by regularising the squared error with a convex penalty function like the ℓ1 norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this thesis, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the ℓ1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish several properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, for solving the regularised least squares problem with these penalty functions, we present a convergent optimisation algorithm and proximal method. Both algorithms are useful numerical techniques taylored for different kinds of penalties. Extensive numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods, such as using other convex optimisation penalties or greedy methods
Recommended from our members
Mathematical Challenges in Electron Microscopy
Development of electron microscopes first started nearly 100 years ago and they are now a mature imaging modality with many applications and vast potential for the future. The principal feature of electron microscopes is their resolution; they can be up to 1000 times more powerful than a visible light microscope and resolve even the smallest atoms. Furthermore, electron microscopes are also sensitive to many material properties due to the very rich interactions between electrons and other matter. Because of these capabilities, electron microscopy is used in applications as diverse as drug discovery, computer chip manufacture, and the development of solar cells.
In parallel to this, the mathematical field of inverse problems has also evolved dramatically. Many new methods have been introduced to improve the recovery of unknown structures from indirect data, typically an ill-posed problem. In particular, sparsity promoting functionals such as the total variation and its extensions have been shown to be very powerful for recovering accurate physical quantities from very little and/or poor quality data. While sparsity-promoting reconstruction methods are powerful, they can also be slow, especially in a big-data setting. This trade-off forms an eternal cycle as new numerical tools are found and more powerful models are developed.
The work presented in this thesis aims to marry the tools of inverse problems with the problems of electron microscopy: bringing state-of-the-art image processing techniques to bear on challenges specific to electron microscopy, developing new optimisation methods for these problems, and modelling new inverse problems to extend the capabilities of existing microscopes. One focus is the application of a directional total variation to overcome the limited angle problem in electron tomography, another is the proposal of a new inverse problem for the reconstruction of 3D strain tensor fields from electron microscopy diffraction data. The remaining contributions target numerical aspects of inverse problems, from new algorithms for non-convex problems to convex optimisation with adaptive meshes.Cantab Capital Institute for Mathematics of Informatio
Recommended from our members
Asymptotic theory for Bayesian nonparametric inference in statistical models arising from partial differential equations
Partial differential equations (PDEs) are primary mathematical tools to model the behaviour of complex real-world systems. PDEs generally include a collection of parameters in their formulation, which are often unknown in applications and need to be estimated from the data. In the present thesis, we investigate the theoretical performance of nonparametric Bayesian procedures in such parameter identification problems in PDEs. In particular, inverse regression models for elliptic equations and stochastic diffusion
models are considered.
In Chapter 2, we study the statistical inverse problem of recovering an unknown function from a linear indirect measurement corrupted by additive Gaussian white noise. We employ a nonparametric Bayesian approach with standard Gaussian priors, for which the posterior-based reconstruction corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty. We prove a semiparametric Bernstein–von Mises theorem for a large collection of linear functionals of the unknown, implying that semiparametric posterior estimation and uncertainty quantification are valid and optimal from a frequentist point of view. The general result is applied to three concrete examples that cover both the mildly and severely ill-posed cases: specifically, elliptic inverse problems, an elliptic boundary value problem, and the recovery of the initial condition of the heat equation. For the elliptic boundary value problem, we also obtain a nonparametric version of the theorem that entails the convergence of the posterior distribution to a prior-independent infinite-dimensional Gaussian probability measure with minimal covariance. As a consequence, it follows that the Tikhonov regulariser is an efficient estimator, and we derive frequentist guarantees for certain credible balls centred around it.
Chapter 3 is concerned with statistical nonlinear inverse problems. We focus on the prototypical example of recovering the unknown conductivity function in an elliptic PDE in divergence form from discrete noisy point evaluations of the PDE solution. We study the statistical performance of Bayesian nonparametric procedures based on a flexible class of Gaussian (or hierarchical Gaussian) process priors, whose implementation is feasible by MCMC methods. We show that, as the number of measurements increases, the resulting posterior distributions concentrate around the true parameter generating the data, and derive a convergence rate, algebraic in inverse sample size, for the estimation error of the associated posterior means.
Finally, in Chapter 4 we extend the posterior consistency analysis to dynamical models based on stochastic differential equations. We study nonparametric Bayesian models for reversible multi-dimensional diffusions with periodic drift. For continuous observation paths, reversibility is exploited to prove a general posterior contraction rate theorem for the drift gradient vector field under approximation-theoretic conditions on the induced prior for the invariant measure. The general theorem is applied to Gaussian priors and p-exponential priors, which are shown to converge to the truth at the minimax optimal rate over Sobolev smoothness classes in any dimension.
Chapter 1 is dedicated to introducing the statistical models considered in Chapters 2 - 4, and to providing an overview of the theoretical results derived therein. The main theorems of Chapter 2 and Chapter 3 are illustrated via the results of simulations, and detailed comments are provided on the implementation.Richard Nickl’s ERC grant No. 647812; EPSRC grant EP/L016516/1 for the
Cambridge Centre for Analysi