Neural network training is usually accomplished by solving a non-convex
optimization problem using stochastic gradient descent. Although one optimizes
over the networks parameters, the main loss function generally only depends on
the realization of the neural network, i.e. the function it computes. Studying
the optimization problem over the space of realizations opens up new ways to
understand neural network training. In particular, usual loss functions like
mean squared error and categorical cross entropy are convex on spaces of neural
network realizations, which themselves are non-convex. Approximation
capabilities of neural networks can be used to deal with the latter
non-convexity, which allows us to establish that for sufficiently large
networks local minima of a regularized optimization problem on the realization
space are almost optimal. Note, however, that each realization has many
different, possibly degenerate, parametrizations. In particular, a local
minimum in the parametrization space needs not correspond to a local minimum in
the realization space. To establish such a connection, inverse stability of the
realization map is required, meaning that proximity of realizations must imply
proximity of corresponding parametrizations. We present pathologies which
prevent inverse stability in general, and, for shallow networks, proceed to
establish a restricted space of parametrizations on which we have inverse
stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing
over such restricted sets, it is still possible to learn any function which can
be learned by optimization over unrestricted sets.Comment: Accepted at NeurIPS 201