79,653 research outputs found
Representation and statistical properties of deep neural networks on structured data
Significant success of deep learning has brought unprecedented challenges to conventional wisdom in statistics, optimization, and applied mathematics. In many high-dimensional applications, e.g., image data of hundreds of thousands of pixels, deep learning is remarkably scalable and mysteriously generalizes well. Although such appealing behavior stimulates wide applications, a fundamental theoretical challenge -- curse of data dimensionality -- naturally arises. Roughly put, the sample complexity in practical applications is significantly smaller than that predicted by theory. It is a common belief that deep neural networks are good at learning various geometric structures hidden in data sets. However, little theory has been established to explain such a power. This thesis aims to bridge the gap between theory and practice by studying function approximation and statistical theories of deep neural networks in exploitation of geometric structures in data.
-- Function Approximation Theories on Low-dimensional Manifolds using Deep Neural Networks.
We first develop an efficient universal approximation theory functions on a low-dimensional Riemannian manifold. A feedforward network architecture is constructed for function approximation, where the size of the network grows depending on the manifold dimension. Furthermore, we prove efficient approximation theory for convolutional residual networks in approximating Besov functions. Lastly, we demonstrate the benefit of overparameterized neural networks in function approximation. Specifically, we show that large neural networks are capable of accurately approximating a target function, and the network itself enjoys Lipschitz continuity.
-- Statistical Theories on Low-dimensional Data using Deep Neural Networks.
Efficient approximation theories of neural networks provide valuable guidelines to properly choose network architectures, when data exhibit geometric structures. In combination with statistical tools, we prove that neural networks can circumvent the curse of data dimensionality and enjoy fast statistical convergence in various learning problems, including nonparametric regression/classification, generative distribution estimation, and doubly-robust policy learning.Ph.D
Learning Theory of Distribution Regression with Neural Networks
In this paper, we aim at establishing an approximation theory and a learning
theory of distribution regression via a fully connected neural network (FNN).
In contrast to the classical regression methods, the input variables of
distribution regression are probability measures. Then we often need to perform
a second-stage sampling process to approximate the actual information of the
distribution. On the other hand, the classical neural network structure
requires the input variable to be a vector. When the input samples are
probability distributions, the traditional deep neural network method cannot be
directly used and the difficulty arises for distribution regression. A
well-defined neural network structure for distribution inputs is intensively
desirable. There is no mathematical model and theoretical analysis on neural
network realization of distribution regression. To overcome technical
difficulties and address this issue, we establish a novel fully connected
neural network framework to realize an approximation theory of functionals
defined on the space of Borel probability measures. Furthermore, based on the
established functional approximation results, in the hypothesis space induced
by the novel FNN structure with distribution inputs, almost optimal learning
rates for the proposed distribution regression model up to logarithmic terms
are derived via a novel two-stage error decomposition technique
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Deep learning has been wildly successful in practice and most
state-of-the-art machine learning methods are based on neural networks.
Lacking, however, is a rigorous mathematical theory that adequately explains
the amazing performance of deep neural networks. In this article, we present a
relatively new mathematical framework that provides the beginning of a deeper
understanding of deep learning. This framework precisely characterizes the
functional properties of neural networks that are trained to fit to data. The
key mathematical tools which support this framework include transform-domain
sparse regularization, the Radon transform of computed tomography, and
approximation theory, which are all techniques deeply rooted in signal
processing. This framework explains the effect of weight decay regularization
in neural network training, the use of skip connections and low-rank weight
matrices in network architectures, the role of sparsity in neural networks, and
explains why neural networks can perform well in high-dimensional problems
- …