27 research outputs found
Recommended from our members
Deep Learning and Inverse Problems
Big data and deep learning are modern buzz words which presently infiltrate all fields of science and technology. These new concepts are impressive in terms of the stunning results they achieve for a large variety of applications. However, the theoretical justification for their success is still very limited. In this snapshot, we highlight some of the very recent mathematical results that are the beginnings of a solid theoretical foundation for the subject
A note concerning polyhyperbolic and related splines
This note concerns the finite interpolation problem with two parametrized
families of splines related to polynomial spline interpolation. We address the
questions of uniqueness and establish basic convergence rates for splines of
the form and between the nodes where .Comment: 13 pages, updated to include fundin
A stochastic optimization approach to train non-linear neural networks with regularization of higher-order total variation
While highly expressive parametric models including deep neural networks have
an advantage to model complicated concepts, training such highly non-linear
models is known to yield a high risk of notorious overfitting. To address this
issue, this study considers a th order total variation (-TV)
regularization, which is defined as the squared integral of the th order
derivative of the parametric models to be trained; penalizing the -TV is
expected to yield a smoother function, which is expected to avoid overfitting.
While the -TV terms applied to general parametric models are computationally
intractable due to the integration, this study provides a stochastic
optimization algorithm, that can efficiently train general models with the
-TV regularization without conducting explicit numerical integration. The
proposed approach can be applied to the training of even deep neural networks
whose structure is arbitrary, as it can be implemented by only a simple
stochastic gradient descent algorithm and automatic differentiation. Our
numerical experiments demonstrate that the neural networks trained with the
-TV terms are more ``resilient'' than those with the conventional parameter
regularization. The proposed algorithm also can be extended to the
physics-informed training of neural networks (PINNs).Comment: 13 pages, 24 figures, in preparation for submission; comments are
welcome
Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant
We introduce a variational framework to learn the activation functions of
deep neural networks. Our aim is to increase the capacity of the network while
controlling an upper-bound of the actual Lipschitz constant of the input-output
relation. To that end, we first establish a global bound for the Lipschitz
constant of neural networks. Based on the obtained bound, we then formulate a
variational problem for learning activation functions. Our variational problem
is infinite-dimensional and is not computationally tractable. However, we prove
that there always exists a solution that has continuous and piecewise-linear
(linear-spline) activations. This reduces the original problem to a
finite-dimensional minimization where an l1 penalty on the parameters of the
activations favors the learning of sparse nonlinearities. We numerically
compare our scheme with standard ReLU network and its variations, PReLU and
LeakyReLU and we empirically demonstrate the practical aspects of our
framework
Stable interpolation with exponential-polynomial splines and node selection via greedy algorithms
In this work we extend some ideas about greedy algorithms, which are well-established tools for, e.g., kernel bases, and exponential-polynomial splines whose main drawback consists in possible overfitting and consequent oscillations of the approximant. To partially overcome this issue, we develop some results on theoretically optimal interpolation points. Moreover, we introduce two algorithms which perform an adaptive selection of the spline interpolation points based on the minimization either of the sample residuals (f-greedy), or of an upper bound for the approximation error based on the spline Lebesgue function (λ-greedy). Both methods allow us to obtain an adaptive selection of the sampling points, i.e., the spline nodes. While the f-greedy selection is tailored to one specific target function, the λ-greedy algorithm enables us to define target-data-independent interpolation nodes
Duality for Neural Networks through Reproducing Kernel Banach Spaces
Reproducing Kernel Hilbert spaces (RKHS) have been a very successful tool in
various areas of machine learning. Recently, Barron spaces have been used to
prove bounds on the generalisation error for neural networks. Unfortunately,
Barron spaces cannot be understood in terms of RKHS due to the strong nonlinear
coupling of the weights. This can be solved by using the more general
Reproducing Kernel Banach spaces (RKBS). We show that these Barron spaces
belong to a class of integral RKBS. This class can also be understood as an
infinite union of RKHS spaces. Furthermore, we show that the dual space of such
RKBSs, is again an RKBS where the roles of the data and parameters are
interchanged, forming an adjoint pair of RKBSs including a reproducing kernel.
This allows us to construct the saddle point problem for neural networks, which
can be used in the whole field of primal-dual optimisation