18,266 research outputs found
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
Although the optimization objectives for learning neural networks are highly
non-convex, gradient-based methods have been wildly successful at learning
neural networks in practice. This juxtaposition has led to a number of recent
studies on provable guarantees for neural networks trained by gradient descent.
Unfortunately, the techniques in these works are often highly specific to the
problem studied in each setting, relying on different assumptions on the
distribution, optimization parameters, and network architectures, making it
difficult to generalize across different settings. In this work, we propose a
unified non-convex optimization framework for the analysis of neural network
training. We introduce the notions of proxy convexity and proxy
Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original
objective function induces a proxy objective function that is implicitly
minimized when using gradient methods. We show that stochastic gradient descent
(SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads
to efficient guarantees for proxy objective functions. We further show that
many existing guarantees for neural networks trained by gradient descent can be
unified through proxy convexity and proxy PL inequalities.Comment: 15 page
Open- and Closed-Loop Neural Network Verification using Polynomial Zonotopes
We present a novel approach to efficiently compute tight non-convex
enclosures of the image through neural networks with ReLU, sigmoid, or
hyperbolic tangent activation functions. In particular, we abstract the
input-output relation of each neuron by a polynomial approximation, which is
evaluated in a set-based manner using polynomial zonotopes. While our approach
can also can be beneficial for open-loop neural network verification, our main
application is reachability analysis of neural network controlled systems,
where polynomial zonotopes are able to capture the non-convexity caused by the
neural network as well as the system dynamics. This results in a superior
performance compared to other methods, as we demonstrate on various benchmarks
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Quantification of the stationary points and the associated basins of
attraction of neural network loss surfaces is an important step towards a
better understanding of neural network loss surfaces at large. This work
proposes a novel method to visualise basins of attraction together with the
associated stationary points via gradient-based random sampling. The proposed
technique is used to perform an empirical study of the loss surfaces generated
by two different error metrics: quadratic loss and entropic loss. The empirical
observations confirm the theoretical hypothesis regarding the nature of neural
network attraction basins. Entropic loss is shown to exhibit stronger gradients
and fewer stationary points than quadratic loss, indicating that entropic loss
has a more searchable landscape. Quadratic loss is shown to be more resilient
to overfitting than entropic loss. Both losses are shown to exhibit local
minima, but the number of local minima is shown to decrease with an increase in
dimensionality. Thus, the proposed visualisation technique successfully
captures the local minima properties exhibited by the neural network loss
surfaces, and can be used for the purpose of fitness landscape analysis of
neural networks.Comment: Preprint submitted to the Neural Networks journa
DANTE: Deep AlterNations for Training nEural networks
We present DANTE, a novel method for training neural networks using the
alternating minimization principle. DANTE provides an alternate perspective to
traditional gradient-based backpropagation techniques commonly used to train
deep networks. It utilizes an adaptation of quasi-convexity to cast training a
neural network as a bi-quasi-convex optimization problem. We show that for
neural network configurations with both differentiable (e.g. sigmoid) and
non-differentiable (e.g. ReLU) activation functions, we can perform the
alternations effectively in this formulation. DANTE can also be extended to
networks with multiple hidden layers. In experiments on standard datasets,
neural networks trained using the proposed method were found to be promising
and competitive to traditional backpropagation techniques, both in terms of
quality of the solution, as well as training speed.Comment: 19 page
Robust federated learning with noisy communication
Federated learning is a communication-efficient training process that alternate between local training at the edge devices and averaging of the updated local model at the center server. Nevertheless, it is impractical to achieve perfect acquisition of the local models in wireless communication due to the noise, which also brings serious effect on federated learning. To tackle this challenge in this paper, we propose a robust design for federated learning to decline the effect of noise. Considering the noise in two aforementioned steps, we first formulate the training problem as a parallel optimization for each node under the expectation-based model and worst-case model. Due to the non-convexity of the problem, regularizer approximation method is proposed to make it tractable. Regarding the worst-case model, we utilize the sampling-based successive convex approximation algorithm to develop a feasible training scheme to tackle the unavailable maxima or minima noise condition and the non-convex issue of the objective function. Furthermore, the convergence rates of both new designs are analyzed from a theoretical point of view. Finally, the improvement of prediction accuracy and the reduction of loss function value are demonstrated via simulation for the proposed designs
- …