Search CORE

8 research outputs found

Global Convergence of Frank Wolfe on One Hidden Layer Networks

Author: d'Aspremont Alexandre
Pilanci Mert
Publication venue
Publication date: 06/02/2020
Field of study

We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then converges with rate

O(1/T)

where

T

is both the number of neurons and the number of calls to the oracle

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

Author: Ergen Tolga
Pilanci Mert
Publication venue
Publication date: 14/01/2022
Field of study

Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature. To this end, we study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases. We then show that pathwise regularized training problems can be represented as an exact convex optimization problem. We further prove that the equivalent convex problem is regularized via a group sparsity inducing norm. Thus, a path regularized parallel ReLU network can be viewed as a parsimonious convex model in high dimensions. More importantly, we show that the computational complexity required to globally optimize the equivalent convex problem is fully polynomial-time in feature dimension and number of samples. Therefore, we prove polynomial-time trainability of path regularized ReLU networks with global optimality guarantees. We also provide several numerical experiments corroborating our theory

arXiv.org e-Print Archive

Theoretical Deep Learning

Author: He Fengxiang
Publication venue: Faculty of Engineering, School of Computer Science
Publication date: 01/01/2021
Field of study

Deep learning has long been criticised as a black-box model for lacking sound theoretical explanation. During the PhD course, I explore and establish theoretical foundations for deep learning. In this thesis, I present my contributions positioned upon existing literature: (1) analysing the generalizability of the neural networks with residual connections via complexity and capacity-based hypothesis complexity measures; (2) modeling stochastic gradient descent (SGD) by stochastic differential equations (SDEs) and their dynamics, and further characterizing the generalizability of deep learning; (3) understanding the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems, which sheds light in reconciling the over-representation and excellent generalizability of deep learning; and (4) discovering the interplay between generalization, privacy preservation, and adversarial robustness, which have seen rising concerns in deep learning deployment

Sydney eScholarship