Training Fully Connected Neural Networks is βˆƒR\exists\mathbb{R}-Complete

Abstract

We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is βˆƒR\exists\mathbb{R}-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Our results hold even if the following restrictions are all added simultaneously. βˆ™\bullet There are exactly two output neurons. βˆ™\bullet There are exactly two input neurons. βˆ™\bullet The data has only 13 different labels. βˆ™\bullet The number of hidden neurons is a constant fraction of the number of data points. βˆ™\bullet The target training error is zero. βˆ™\bullet The ReLU activation function is used. This shows that even very simple networks are difficult to train. The result offers an explanation (though far from a complete understanding) on why only gradient descent is widely successful in training neural networks in practice. We generalize a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021]. This result falls into a recent line of research that tries to unveil that a series of central algorithmic problems from widely different areas of computer science and mathematics are βˆƒR\exists\mathbb{R}-complete: This includes the art gallery problem [JACM/STOC 2018], geometric packing [FOCS 2020], covering polygons with convex polygons [FOCS 2021], and continuous constraint satisfaction problems [FOCS 2021].Comment: 38 pages, 18 figure

    Similar works

    Full text

    thumbnail-image

    Available Versions