Search CORE

438 research outputs found

The Global Landscape of Neural Networks: An Overview

Author: Ding Tian
Li Dawei
Liang Shiyu
Srikant R
Sun Ruoyu
Publication venue
Publication date: 02/07/2020
Field of study

One of the major concerns for neural network training is that the non-convexity of the associated loss functions may cause bad landscape. The recent success of neural networks suggests that their loss landscape is not too bad, but what specific results do we know about the landscape? In this article, we review recent findings and results on the global landscape of neural networks. First, we point out that wide neural nets may have sub-optimal local minima under certain assumptions. Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity. Third, we discuss visualization and empirical explorations of the landscape for practical neural nets. Finally, we briefly discuss some convergence results and their relation to landscape results.Comment: 16 pages. 8 figure

arXiv.org e-Print Archive

Why Do Local Methods Solve Nonconvex Problems?

Author: Ma Tengyu
Publication venue
Publication date: 01/01/2020
Field of study

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.Comment: This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms

arXiv.org e-Print Archive

Crossref