228,447 research outputs found
Contrastive Learning for Lifted Networks
In this work we address supervised learning of neural networks via lifted
network formulations. Lifted networks are interesting because they allow
training on massively parallel hardware and assign energy models to
discriminatively trained neural networks. We demonstrate that the training
methods for lifted networks proposed in the literature have significant
limitations and show how to use a contrastive loss to address those
limitations. We demonstrate that this contrastive training approximates
back-propagation in theory and in practice and that it is superior to the
training objective regularly used for lifted networks.Comment: 9 pages, BMVC 201
Nonparametric regression using deep neural networks with ReLU activation function
Consider the multivariate nonparametric regression model. It is shown that
estimators based on sparsely connected deep neural networks with ReLU
activation function and properly chosen network architecture achieve the
minimax rates of convergence (up to -factors) under a general
composition assumption on the regression function. The framework includes many
well-studied structural constraints such as (generalized) additive models.
While there is a lot of flexibility in the network architecture, the tuning
parameter is the sparsity of the network. Specifically, we consider large
networks with number of potential network parameters exceeding the sample size.
The analysis gives some insights into why multilayer feedforward neural
networks perform well in practice. Interestingly, for ReLU activation function
the depth (number of layers) of the neural network architectures plays an
important role and our theory suggests that for nonparametric regression,
scaling the network depth with the sample size is natural. It is also shown
that under the composition assumption wavelet estimators can only achieve
suboptimal rates.Comment: article, rejoinder and supplementary materia
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Deep learning has been wildly successful in practice and most
state-of-the-art machine learning methods are based on neural networks.
Lacking, however, is a rigorous mathematical theory that adequately explains
the amazing performance of deep neural networks. In this article, we present a
relatively new mathematical framework that provides the beginning of a deeper
understanding of deep learning. This framework precisely characterizes the
functional properties of neural networks that are trained to fit to data. The
key mathematical tools which support this framework include transform-domain
sparse regularization, the Radon transform of computed tomography, and
approximation theory, which are all techniques deeply rooted in signal
processing. This framework explains the effect of weight decay regularization
in neural network training, the use of skip connections and low-rank weight
matrices in network architectures, the role of sparsity in neural networks, and
explains why neural networks can perform well in high-dimensional problems
- …