We consider the algorithmic problem of finding the optimal weights and biases
for a two-layer fully connected neural network to fit a given set of data
points. This problem is known as empirical risk minimization in the machine
learning community. We show that the problem is βR-complete.
This complexity class can be defined as the set of algorithmic problems that
are polynomial-time equivalent to finding real roots of a polynomial with
integer coefficients. Our results hold even if the following restrictions are
all added simultaneously.
β There are exactly two output neurons.
β There are exactly two input neurons.
β The data has only 13 different labels.
β The number of hidden neurons is a constant fraction of the number
of data points.
β The target training error is zero.
β The ReLU activation function is used.
This shows that even very simple networks are difficult to train. The result
offers an explanation (though far from a complete understanding) on why only
gradient descent is widely successful in training neural networks in practice.
We generalize a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021].
This result falls into a recent line of research that tries to unveil that a
series of central algorithmic problems from widely different areas of computer
science and mathematics are βR-complete: This includes the art
gallery problem [JACM/STOC 2018], geometric packing [FOCS 2020], covering
polygons with convex polygons [FOCS 2021], and continuous constraint
satisfaction problems [FOCS 2021].Comment: 38 pages, 18 figure