1,145 research outputs found
Empirical Bounds on Linear Regions of Deep Rectifier Networks
We can compare the expressiveness of neural networks that use rectified
linear units (ReLUs) by the number of linear regions, which reflect the number
of pieces of the piecewise linear functions modeled by such networks. However,
enumerating these regions is prohibitive and the known analytical bounds are
identical for networks with same dimensions. In this work, we approximate the
number of linear regions through empirical bounds based on features of the
trained network and probabilistic inference. Our first contribution is a method
to sample the activation patterns defined by ReLUs using universal hash
functions. This method is based on a Mixed-Integer Linear Programming (MILP)
formulation of the network and an algorithm for probabilistic lower bounds of
MILP solution sets that we call MIPBound, which is considerably faster than
exact counting and reaches values in similar orders of magnitude. Our second
contribution is a tighter activation-based bound for the maximum number of
linear regions, which is particularly stronger in networks with narrow layers.
Combined, these bounds yield a fast proxy for the number of linear regions of a
deep neural network.Comment: AAAI 202
Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size
The development of a satisfying and rigorous mathematical understanding of
the performance of neural networks is a major challenge in artificial
intelligence. Against this background, we study the expressive power of neural
networks through the example of the classical NP-hard Knapsack Problem. Our
main contribution is a class of recurrent neural networks (RNNs) with rectified
linear units that are iteratively applied to each item of a Knapsack instance
and thereby compute optimal or provably good solution values. We show that an
RNN of depth four and width depending quadratically on the profit of an optimum
Knapsack solution is sufficient to find optimum Knapsack solutions. We also
prove the following tradeoff between the size of an RNN and the quality of the
computed Knapsack solution: for Knapsack instances consisting of items, an
RNN of depth five and width computes a solution of value at least
times the optimum solution value. Our results
build upon a classical dynamic programming formulation of the Knapsack Problem
as well as a careful rounding of profit values that are also at the core of the
well-known fully polynomial-time approximation scheme for the Knapsack Problem.
A carefully conducted computational study qualitatively supports our
theoretical size bounds. Finally, we point out that our results can be
generalized to many other combinatorial optimization problems that admit
dynamic programming solution methods, such as various Shortest Path Problems,
the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.Comment: A short version of this paper appears in the proceedings of AAAI 202
On the Depth of Deep Neural Networks: A Theoretical View
People believe that depth plays an important role in success of deep neural
networks (DNN). However, this belief lacks solid theoretical justifications as
far as we know. We investigate role of depth from perspective of margin bound.
In margin bound, expected error is upper bounded by empirical margin error plus
Rademacher Average (RA) based capacity term. First, we derive an upper bound
for RA of DNN, and show that it increases with increasing depth. This indicates
negative impact of depth on test performance. Second, we show that deeper
networks tend to have larger representation power (measured by Betti numbers
based complexity) than shallower networks in multi-class setting, and thus can
lead to smaller empirical margin error. This implies positive impact of depth.
The combination of these two results shows that for DNN with restricted number
of hidden units, increasing depth is not always good since there is a tradeoff
between positive and negative impacts. These results inspire us to seek
alternative ways to achieve positive impact of depth, e.g., imposing
margin-based penalty terms to cross entropy loss so as to reduce empirical
margin error without increasing depth. Our experiments show that in this way,
we achieve significantly better test performance.Comment: AAAI 201
- …