1,074 research outputs found
Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations
The success of deep convolutional architectures is often attributed in part
to their ability to learn multiscale and invariant representations of natural
signals. However, a precise study of these properties and how they affect
learning guarantees is still missing. In this paper, we consider deep
convolutional representations of signals; we study their invariance to
translations and to more general groups of transformations, their stability to
the action of diffeomorphisms, and their ability to preserve signal
information. This analysis is carried by introducing a multilayer kernel based
on convolutional kernel networks and by studying the geometry induced by the
kernel mapping. We then characterize the corresponding reproducing kernel
Hilbert space (RKHS), showing that it contains a large class of convolutional
neural networks with homogeneous activation functions. This analysis allows us
to separate data representation from learning, and to provide a canonical
measure of model complexity, the RKHS norm, which controls both stability and
generalization of any learned model. In addition to models in the constructed
RKHS, our stability analysis also applies to convolutional networks with
generic activations such as rectified linear units, and we discuss its
relationship with recent generalization bounds based on spectral norms
The Sample Complexity of One-Hidden-Layer Neural Networks
We study norm-based uniform convergence bounds for neural networks, aiming at
a tight understanding of how these are affected by the architecture and type of
norm constraint, for the simple class of scalar-valued one-hidden-layer
networks, and inputs bounded in Euclidean norm. We begin by proving that in
general, controlling the spectral norm of the hidden layer weight matrix is
insufficient to get uniform convergence guarantees (independent of the network
width), while a stronger Frobenius norm control is sufficient, extending and
improving on previous work. Motivated by the proof constructions, we identify
and analyze two important settings where (perhaps surprisingly) a mere spectral
norm control turns out to be sufficient: First, when the network's activation
functions are sufficiently smooth (with the result extending to deeper
networks); and second, for certain types of convolutional networks. In the
latter setting, we study how the sample complexity is additionally affected by
parameters such as the amount of overlap between patches and the overall number
of patches.Comment: Bug fixed in proof of Theorem 2 (resulting in different log factors);
Other minor edit
A jamming transition from under- to over-parametrization affects loss landscape and generalization
We argue that in fully-connected networks a phase transition delimits the
over- and under-parametrized regimes where fitting can or cannot be achieved.
Under some general conditions, we show that this transition is sharp for the
hinge loss. In the whole over-parametrized regime, poor minima of the loss are
not encountered during training since the number of constraints to satisfy is
too small to hamper minimization. Our findings support a link between this
transition and the generalization properties of the network: as we increase the
number of parameters of a given model, starting from an under-parametrized
network, we observe that the generalization error displays three phases: (i)
initial decay, (ii) increase until the transition point --- where it displays a
cusp --- and (iii) slow decay toward a constant for the rest of the
over-parametrized regime. Thereby we identify the region where the classical
phenomenon of over-fitting takes place, and the region where the model keeps
improving, in line with previous empirical observations for modern neural
networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models
Due to the non-convex nature of training Deep Neural Network (DNN) models,
their effectiveness relies on the use of non-convex optimization heuristics.
Traditional methods for training DNNs often require costly empirical methods to
produce successful models and do not have a clear theoretical foundation. In
this study, we examine the use of convex optimization theory and sparse
recovery models to refine the training process of neural networks and provide a
better interpretation of their optimal weights. We focus on training two-layer
neural networks with piecewise linear activations and demonstrate that they can
be formulated as a finite-dimensional convex program. These programs include a
regularization term that promotes sparsity, which constitutes a variant of
group Lasso. We first utilize semi-infinite programming theory to prove strong
duality for finite width neural networks and then we express these
architectures equivalently as high dimensional convex sparse recovery models.
Remarkably, the worst-case complexity to solve the convex program is polynomial
in the number of samples and number of neurons when the rank of the data matrix
is bounded, which is the case in convolutional networks. To extend our method
to training data of arbitrary rank, we develop a novel polynomial-time
approximation scheme based on zonotope subsampling that comes with a guaranteed
approximation ratio. We also show that all the stationary of the nonconvex
training objective can be characterized as the global optimum of a subsampled
convex program. Our convex models can be trained using standard convex solvers
without resorting to heuristics or extensive hyper-parameter tuning unlike
non-convex methods. Through extensive numerical experiments, we show that
convex models can outperform traditional non-convex methods and are not
sensitive to optimizer hyperparameters.Comment: A preliminary version of part of this work was published at ICML 2020
with the title "Neural Networks are Convex Regularizers: Exact
Polynomial-time Convex Optimization Formulations for Two-layer Networks
- …