Search CORE

18,266 research outputs found

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Author: Frei Spencer
Gu Quanquan
Publication venue
Publication date: 20/07/2021
Field of study

Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the problem studied in each setting, relying on different assumptions on the distribution, optimization parameters, and network architectures, making it difficult to generalize across different settings. In this work, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.Comment: 15 page

arXiv.org e-Print Archive

Open- and Closed-Loop Neural Network Verification using Polynomial Zonotopes

Author: Althoff Matthias
Bak Stanley
Kochdumper Niklas
Schilling Christian
Publication venue
Publication date: 17/04/2023
Field of study

We present a novel approach to efficiently compute tight non-convex enclosures of the image through neural networks with ReLU, sigmoid, or hyperbolic tangent activation functions. In particular, we abstract the input-output relation of each neuron by a polynomial approximation, which is evaluated in a set-based manner using polynomial zonotopes. While our approach can also can be beneficial for open-loop neural network verification, our main application is reachability analysis of neural network controlled systems, where polynomial zonotopes are able to capture the non-convexity caused by the neural network as well as the system dynamics. This results in a superior performance compared to other methods, as we demonstrate on various benchmarks

arXiv.org e-Print Archive

Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions

Author: Bosman Anna Sergeevna
Engelbrecht Andries
Helbig Mardé
Publication venue
Publication date: 09/01/2019
Field of study

Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based random sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.Comment: Preprint submitted to the Neural Networks journa

arXiv.org e-Print Archive

UPSpace at the University of Pretoria

DANTE: Deep AlterNations for Training nEural networks

Author: Balasubramanian Vineeth N
Chavali Surya Teja
Kar Purushottam
Kudugunta Sneha
Sankar Adepu Ravi
Sinha Vaibhav B
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

We present DANTE, a novel method for training neural networks using the alternating minimization principle. DANTE provides an alternate perspective to traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convexity to cast training a neural network as a bi-quasi-convex optimization problem. We show that for neural network configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations effectively in this formulation. DANTE can also be extended to networks with multiple hidden layers. In experiments on standard datasets, neural networks trained using the proposed method were found to be promising and competitive to traditional backpropagation techniques, both in terms of quality of the solution, as well as training speed.Comment: 19 page

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad

Robust federated learning with noisy communication

Author: Ang Fan
Chen Li
Chen Yunfei
Wang Weidong
Yu F. Richard
Zhao Nan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2019
Field of study

Federated learning is a communication-efficient training process that alternate between local training at the edge devices and averaging of the updated local model at the center server. Nevertheless, it is impractical to achieve perfect acquisition of the local models in wireless communication due to the noise, which also brings serious effect on federated learning. To tackle this challenge in this paper, we propose a robust design for federated learning to decline the effect of noise. Considering the noise in two aforementioned steps, we first formulate the training problem as a parallel optimization for each node under the expectation-based model and worst-case model. Due to the non-convexity of the problem, regularizer approximation method is proposed to make it tractable. Regarding the worst-case model, we utilize the sampling-based successive convex approximation algorithm to develop a feasible training scheme to tackle the unavailable maxima or minima noise condition and the non-convex issue of the objective function. Furthermore, the convergence rates of both new designs are analyzed from a theoretical point of view. Finally, the improvement of prediction accuracy and the reduction of loss function value are demonstrated via simulation for the proposed designs

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository