438 research outputs found
The Global Landscape of Neural Networks: An Overview
One of the major concerns for neural network training is that the
non-convexity of the associated loss functions may cause bad landscape. The
recent success of neural networks suggests that their loss landscape is not too
bad, but what specific results do we know about the landscape? In this article,
we review recent findings and results on the global landscape of neural
networks. First, we point out that wide neural nets may have sub-optimal local
minima under certain assumptions. Second, we discuss a few rigorous results on
the geometric properties of wide networks such as "no bad basin", and some
modifications that eliminate sub-optimal local minima and/or decreasing paths
to infinity. Third, we discuss visualization and empirical explorations of the
landscape for practical neural nets. Finally, we briefly discuss some
convergence results and their relation to landscape results.Comment: 16 pages. 8 figure
Why Do Local Methods Solve Nonconvex Problems?
Non-convex optimization is ubiquitous in modern machine learning. Researchers
devise non-convex objective functions and optimize them using off-the-shelf
optimizers such as stochastic gradient descent and its variants, which leverage
the local geometry and update iteratively. Even though solving non-convex
functions is NP-hard in the worst case, the optimization quality in practice is
often not an issue -- optimizers are largely believed to find approximate
global minima. Researchers hypothesize a unified explanation for this
intriguing phenomenon: most of the local minima of the practically-used
objectives are approximately global minima. We rigorously formalize it for
concrete instances of machine learning problems.Comment: This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of
Algorithms
- …