Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Bertschinger, Daniel; Hertrich, Christoph; Jungeblut, Paul; Miltzow, Tillmann; Weber, Simon

Training Fully Connected Neural Networks is $\exists\mathbb{R}$ -Complete

Authors: Daniel Bertschinger
Christoph Hertrich
Paul Jungeblut
Tillmann Miltzow
Simon Weber
Publication date: 4 April 2022
Publisher

Abstract

We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is

\exists\mathbb{R}

-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Our results hold even if the following restrictions are all added simultaneously.

\bullet

There are exactly two output neurons.

\bullet

There are exactly two input neurons.

\bullet

The data has only 13 different labels.

\bullet

The number of hidden neurons is a constant fraction of the number of data points.

\bullet

The target training error is zero.

\bullet

The ReLU activation function is used. This shows that even very simple networks are difficult to train. The result offers an explanation (though far from a complete understanding) on why only gradient descent is widely successful in training neural networks in practice. We generalize a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021]. This result falls into a recent line of research that tries to unveil that a series of central algorithmic problems from widely different areas of computer science and mathematics are

\exists\mathbb{R}

-complete: This includes the art gallery problem [JACM/STOC 2018], geometric packing [FOCS 2020], covering polygons with convex polygons [FOCS 2021], and continuous constraint satisfaction problems [FOCS 2021].Comment: 38 pages, 18 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2204.01368

Last time updated on 26/04/2022