24 research outputs found
Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size
The development of a satisfying and rigorous mathematical understanding of
the performance of neural networks is a major challenge in artificial
intelligence. Against this background, we study the expressive power of neural
networks through the example of the classical NP-hard Knapsack Problem. Our
main contribution is a class of recurrent neural networks (RNNs) with rectified
linear units that are iteratively applied to each item of a Knapsack instance
and thereby compute optimal or provably good solution values. We show that an
RNN of depth four and width depending quadratically on the profit of an optimum
Knapsack solution is sufficient to find optimum Knapsack solutions. We also
prove the following tradeoff between the size of an RNN and the quality of the
computed Knapsack solution: for Knapsack instances consisting of items, an
RNN of depth five and width computes a solution of value at least
times the optimum solution value. Our results
build upon a classical dynamic programming formulation of the Knapsack Problem
as well as a careful rounding of profit values that are also at the core of the
well-known fully polynomial-time approximation scheme for the Knapsack Problem.
A carefully conducted computational study qualitatively supports our
theoretical size bounds. Finally, we point out that our results can be
generalized to many other combinatorial optimization problems that admit
dynamic programming solution methods, such as various Shortest Path Problems,
the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.Comment: A short version of this paper appears in the proceedings of AAAI 202
Training Neural Networks is NP-Hard in Fixed Dimension
We study the parameterized complexity of training two-layer neural networks
with respect to the dimension of the input data and the number of hidden
neurons, considering ReLU and linear threshold activation functions. Albeit the
computational complexity of these problems has been studied numerous times in
recent years, several questions are still open. We answer questions by Arora et
al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are
NP-hard for two dimensions, which excludes any polynomial-time algorithm for
constant dimension. We also answer a question by Froese et al. [JAIR '22]
proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with
zero training error. Finally, in the ReLU case, we show fixed-parameter
tractability for the combined parameter number of dimensions and number of
ReLUs if the network is assumed to compute a convex map. Our results settle the
complexity status regarding these parameters almost completely.Comment: Paper accepted at NeurIPS 202
The computational complexity of ReLU network training parameterized by data dimensionality
Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension d of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter d and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including `p-loss for all p ∈ [0, ∞]. In particular, we improve a known polynomial-time algorithm for constant d and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases
Coloring Drawings of Graphs
We consider face-colorings of drawings of graphs in the plane. Given a
multi-graph together with a drawing in the plane with only
finitely many crossings, we define a face--coloring of to be a
coloring of the maximal connected regions of the drawing, the faces, with
colors such that adjacent faces have different colors. By the -color
theorem, every drawing of a bridgeless graph has a face--coloring. A drawing
of a graph is facially -colorable if and only if the underlying graph is
Eulerian. We show that every graph without degree 1 vertices admits a
-colorable drawing. This leads to the natural question which graphs have
the property that each of its drawings has a -coloring. We say that such a
graph is facially -colorable. We derive several sufficient and necessary
conditions for this property: we show that every -edge-connected graph and
every graph admitting a nowhere-zero -flow is facially -colorable. We
also discuss circumstances under which facial -colorability guarantees the
existence of a nowhere-zero -flow. On the negative side, we present an
infinite family of facially -colorable graphs without a nowhere-zero
-flow. On the positive side, we formulate a conjecture which has a
surprising relation to a famous open problem by Tutte known as the
-flow-conjecture. We prove our conjecture for subcubic and for
-minor-free graphs.Comment: 24 pages, 17 figure
Towards Lower Bounds on the Depth of ReLU Neural Networks
We contribute to a better understanding of the class of functions that is
represented by a neural network with ReLU activations and a given architecture.
Using techniques from mixed-integer optimization, polyhedral theory, and
tropical geometry, we provide a mathematical counterbalance to the universal
approximation theorems which suggest that a single hidden layer is sufficient
for learning tasks. In particular, we investigate whether the class of exactly
representable functions strictly increases by adding more layers (with no
restrictions on size). This problem has potential impact on algorithmic and
statistical aspects because of the insight it provides into the class of
functions represented by neural hypothesis classes. However, to the best of our
knowledge, this question has not been investigated in the neural network
literature. We also present upper bounds on the sizes of neural networks
required to represent functions in these neural hypothesis classes.Comment: Camera-ready version for NeurIPS 2021 conferenc
A First Order Method for Linear Programming Parameterized by Circuit Imbalance
Various first order approaches have been proposed in the literature to solve
Linear Programming (LP) problems, recently leading to practically efficient
solvers for large-scale LPs. From a theoretical perspective, linear convergence
rates have been established for first order LP algorithms, despite the fact
that the underlying formulations are not strongly convex. However, the
convergence rate typically depends on the Hoffman constant of a large matrix
that contains the constraint matrix, as well as the right hand side, cost, and
capacity vectors.
We introduce a first order approach for LP optimization with a convergence
rate depending polynomially on the circuit imbalance measure, which is a
geometric parameter of the constraint matrix, and depending logarithmically on
the right hand side, capacity, and cost vectors. This provides much stronger
convergence guarantees. For example, if the constraint matrix is totally
unimodular, we obtain polynomial-time algorithms, whereas the convergence
guarantees for approaches based on primal-dual formulations may have
arbitrarily slow convergence rates for this class. Our approach is based on a
fast gradient method due to Necoara, Nesterov, and Glineur (Math. Prog. 2019);
this algorithm is called repeatedly in a framework that gradually fixes
variables to the boundary. This technique is based on a new approximate version
of Tardos's method, that was used to obtain a strongly polynomial algorithm for
combinatorial LPs (Oper. Res. 1986)
Training Fully Connected Neural Networks is -Complete
We consider the algorithmic problem of finding the optimal weights and biases
for a two-layer fully connected neural network to fit a given set of data
points. This problem is known as empirical risk minimization in the machine
learning community. We show that the problem is -complete.
This complexity class can be defined as the set of algorithmic problems that
are polynomial-time equivalent to finding real roots of a polynomial with
integer coefficients. Our results hold even if the following restrictions are
all added simultaneously.
There are exactly two output neurons.
There are exactly two input neurons.
The data has only 13 different labels.
The number of hidden neurons is a constant fraction of the number
of data points.
The target training error is zero.
The ReLU activation function is used.
This shows that even very simple networks are difficult to train. The result
offers an explanation (though far from a complete understanding) on why only
gradient descent is widely successful in training neural networks in practice.
We generalize a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021].
This result falls into a recent line of research that tries to unveil that a
series of central algorithmic problems from widely different areas of computer
science and mathematics are -complete: This includes the art
gallery problem [JACM/STOC 2018], geometric packing [FOCS 2020], covering
polygons with convex polygons [FOCS 2021], and continuous constraint
satisfaction problems [FOCS 2021].Comment: 38 pages, 18 figure
Scheduling a Proportionate Flow Shop of Batching Machines
In this paper we study a proportionate flow shop of batching machines with
release dates and a fixed number of machines. The scheduling problem
has so far barely received any attention in the literature, but recently its
importance has increased significantly, due to applications in the industrial
scaling of modern bio-medicine production processes. We show that for any fixed
number of machines, the makespan and the sum of completion times can be
minimized in polynomial time. Furthermore, we show that the obtained algorithm
can also be used to minimize the weighted total completion time, maximum
lateness, total tardiness and (weighted) number of late jobs in polynomial time
if all release dates are . Previously, polynomial time algorithms have only
been known for two machines.Comment: Version 2: replace initial preprint with authors' accepted manuscrip
ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation
This paper studies the expressive power of artificial neural networks (NNs)
with rectified linear units. To study them as a model of real-valued
computation, we introduce the concept of Max-Affine Arithmetic Programs and
show equivalence between them and NNs concerning natural complexity measures.
We then use this result to show that two fundamental combinatorial optimization
problems can be solved with polynomial-size NNs, which is equivalent to the
existence of very special strongly polynomial time algorithms. First, we show
that for any undirected graph with nodes, there is an NN of size
that takes the edge weights as input and computes the value
of a minimum spanning tree of the graph. Second, we show that for any directed
graph with nodes and arcs, there is an NN of size
that takes the arc capacities as input and computes a maximum flow. These
results imply in particular that the solutions of the corresponding parametric
optimization problems where all edge weights or arc capacities are free
parameters can be encoded in polynomial space and evaluated in polynomial time,
and that such an encoding is provided by an NN