Search CORE

13 research outputs found

Deep Learning Meets Sparse Regularization: A Signal Processing Perspective

Author: Nowak Robert D.
Parhi Rahul
Publication venue
Publication date: 08/06/2023
Field of study

Deep learning has been wildly successful in practice and most state-of-the-art machine learning methods are based on neural networks. Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep neural networks. In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of deep learning. This framework precisely characterizes the functional properties of neural networks that are trained to fit to data. The key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory, which are all techniques deeply rooted in signal processing. This framework explains the effect of weight decay regularization in neural network training, the use of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems

arXiv.org e-Print Archive

On the Uniqueness of Inverse Problems with Fourier-domain Measurements and Generalized TV Regularization

Author: Debarre Thomas
Denoyelle Quentin
Fageot Julien
Publication venue
Publication date: 24/09/2020
Field of study

We study the super-resolution problem of recovering a periodic continuous-domain function from its low-frequency information. This means that we only have access to possibly corrupted versions of its Fourier samples up to a maximum cut-off frequency. The reconstruction task is specified as an optimization problem with generalized total-variation regularization involving a pseudo-differential operator. Our special emphasis is on the uniqueness of solutions. We show that, for elliptic regularization operators (e.g., the derivatives of any order), uniqueness is always guaranteed. To achieve this goal, we provide a new analysis of constrained optimization problems over Radon measures. We demonstrate that either the solutions are always made of Radon measures of constant sign, or the solution is unique. Doing so, we identify a general sufficient condition for the uniqueness of the solution of a constrained optimization problem with TV-regularization, expressed in terms of the Fourier samples.Comment: 20 page

arXiv.org e-Print Archive

On the Prediction Performance of the Lasso

Author: Dalalyan Arnak S.
Hebiri Mohamed
Lederer Johannes
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 08/11/2016
Field of study

Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. We finally show that our results also lead to near-optimal rates for the least-squares estimator with total variation penalty

arXiv.org e-Print Archive

HAL - UPEC / UPEM

Structured sparsity with convex penalty functions

Author: Morales JM
Publication venue: UCL (University College London)
Publication date: 28/09/2012
Field of study

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in Machine Learning, Statistics and Signal Processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be “relaxed” by regularising the squared error with a convex penalty function like the ℓ1 norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this thesis, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the ℓ1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish several properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, for solving the regularised least squares problem with these penalty functions, we present a convergent optimisation algorithm and proximal method. Both algorithms are useful numerical techniques taylored for different kinds of penalties. Extensive numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods, such as using other convex optimisation penalties or greedy methods

UCL Discovery

Recommended from our members

Mathematical Challenges in Electron Microscopy

Author: Tovey Robert
Publication venue: University of Cambridge
Publication date: 30/09/2020
Field of study

Development of electron microscopes first started nearly 100 years ago and they are now a mature imaging modality with many applications and vast potential for the future. The principal feature of electron microscopes is their resolution; they can be up to 1000 times more powerful than a visible light microscope and resolve even the smallest atoms. Furthermore, electron microscopes are also sensitive to many material properties due to the very rich interactions between electrons and other matter. Because of these capabilities, electron microscopy is used in applications as diverse as drug discovery, computer chip manufacture, and the development of solar cells. In parallel to this, the mathematical field of inverse problems has also evolved dramatically. Many new methods have been introduced to improve the recovery of unknown structures from indirect data, typically an ill-posed problem. In particular, sparsity promoting functionals such as the total variation and its extensions have been shown to be very powerful for recovering accurate physical quantities from very little and/or poor quality data. While sparsity-promoting reconstruction methods are powerful, they can also be slow, especially in a big-data setting. This trade-off forms an eternal cycle as new numerical tools are found and more powerful models are developed. The work presented in this thesis aims to marry the tools of inverse problems with the problems of electron microscopy: bringing state-of-the-art image processing techniques to bear on challenges specific to electron microscopy, developing new optimisation methods for these problems, and modelling new inverse problems to extend the capabilities of existing microscopes. One focus is the application of a directional total variation to overcome the limited angle problem in electron tomography, another is the proposal of a new inverse problem for the reconstruction of 3D strain tensor fields from electron microscopy diffraction data. The remaining contributions target numerical aspects of inverse problems, from new algorithms for non-convex problems to convex optimisation with adaptive meshes.Cantab Capital Institute for Mathematics of Informatio

Apollo (Cambridge)

Recommended from our members

Asymptotic theory for Bayesian nonparametric inference in statistical models arising from partial differential equations

Author: Giordano Matteo
Publication venue: University of Cambridge
Publication date: 01/01/2021
Field of study

Partial differential equations (PDEs) are primary mathematical tools to model the behaviour of complex real-world systems. PDEs generally include a collection of parameters in their formulation, which are often unknown in applications and need to be estimated from the data. In the present thesis, we investigate the theoretical performance of nonparametric Bayesian procedures in such parameter identification problems in PDEs. In particular, inverse regression models for elliptic equations and stochastic diffusion models are considered. In Chapter 2, we study the statistical inverse problem of recovering an unknown function from a linear indirect measurement corrupted by additive Gaussian white noise. We employ a nonparametric Bayesian approach with standard Gaussian priors, for which the posterior-based reconstruction corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty. We prove a semiparametric Bernstein–von Mises theorem for a large collection of linear functionals of the unknown, implying that semiparametric posterior estimation and uncertainty quantification are valid and optimal from a frequentist point of view. The general result is applied to three concrete examples that cover both the mildly and severely ill-posed cases: specifically, elliptic inverse problems, an elliptic boundary value problem, and the recovery of the initial condition of the heat equation. For the elliptic boundary value problem, we also obtain a nonparametric version of the theorem that entails the convergence of the posterior distribution to a prior-independent infinite-dimensional Gaussian probability measure with minimal covariance. As a consequence, it follows that the Tikhonov regulariser is an efficient estimator, and we derive frequentist guarantees for certain credible balls centred around it. Chapter 3 is concerned with statistical nonlinear inverse problems. We focus on the prototypical example of recovering the unknown conductivity function in an elliptic PDE in divergence form from discrete noisy point evaluations of the PDE solution. We study the statistical performance of Bayesian nonparametric procedures based on a flexible class of Gaussian (or hierarchical Gaussian) process priors, whose implementation is feasible by MCMC methods. We show that, as the number of measurements increases, the resulting posterior distributions concentrate around the true parameter generating the data, and derive a convergence rate, algebraic in inverse sample size, for the estimation error of the associated posterior means. Finally, in Chapter 4 we extend the posterior consistency analysis to dynamical models based on stochastic differential equations. We study nonparametric Bayesian models for reversible multi-dimensional diffusions with periodic drift. For continuous observation paths, reversibility is exploited to prove a general posterior contraction rate theorem for the drift gradient vector field under approximation-theoretic conditions on the induced prior for the invariant measure. The general theorem is applied to Gaussian priors and p-exponential priors, which are shown to converge to the truth at the minimax optimal rate over Sobolev smoothness classes in any dimension. Chapter 1 is dedicated to introducing the statistical models considered in Chapters 2 - 4, and to providing an overview of the theoretical results derived therein. The main theorems of Chapter 2 and Chapter 3 are illustrated via the results of simulations, and detailed comments are provided on the implementation.Richard Nickl’s ERC grant No. 647812; EPSRC grant EP/L016516/1 for the Cambridge Centre for Analysi

Apollo (Cambridge)

Institutional Research Information System University of Turin