153 research outputs found
Empirical Bounds on Linear Regions of Deep Rectifier Networks
We can compare the expressiveness of neural networks that use rectified
linear units (ReLUs) by the number of linear regions, which reflect the number
of pieces of the piecewise linear functions modeled by such networks. However,
enumerating these regions is prohibitive and the known analytical bounds are
identical for networks with same dimensions. In this work, we approximate the
number of linear regions through empirical bounds based on features of the
trained network and probabilistic inference. Our first contribution is a method
to sample the activation patterns defined by ReLUs using universal hash
functions. This method is based on a Mixed-Integer Linear Programming (MILP)
formulation of the network and an algorithm for probabilistic lower bounds of
MILP solution sets that we call MIPBound, which is considerably faster than
exact counting and reaches values in similar orders of magnitude. Our second
contribution is a tighter activation-based bound for the maximum number of
linear regions, which is particularly stronger in networks with narrow layers.
Combined, these bounds yield a fast proxy for the number of linear regions of a
deep neural network.Comment: AAAI 202
Network Parameterisation and Activation Functions in Deep Learning
Deep learning, the study of multi-layered artificial neural networks, has received tremendous attention over the course of the last few years. Neural networks are now able to outperform humans in a growing variety of tasks and increasingly have an impact on our day-to-day lives. There is a wide range of potential directions to advance deep learning, two of which we investigate in this thesis:(1) One of the key components of a network are its activation functions. The activations have a big impact on the overall mathematical form of the network. The \textit{first paper} studies generalisation of neural networks with rectified linear activations units (“ReLUs”). Such networks partition the input space into so-called linear regions, which are the maximally connected subsets on which the network is affine. In contrast to previous work, which focused on obtaining estimates of the number of linear regions, we proposed a tropical algebra-based algorithm called TropEx to extract coefficients of the linear regions. Applied to fully-connected and convolutional neural networks, TropEx shows significant differences between the linear regions of these network types. The \textit{second paper} proposes a parametric rational activation function called ERA, which is learnable during network training. Although ERA only adds about ten parameters per layer, the activation significantly increases network expressivity and makes small architectures have a performance close to large ones. ERA outperforms previous activations when used in small architectures. This is relevant because neural networks keep growing larger and larger and the computational resources they require result in greater costs and electricity usage (which in turn increases the CO2 footprint).(2) For a given network architecture, each parameter configuration gives rise to a mathematical function. This functional realisation is far from unique and many different parameterisations can give rise to the same function. Changes to the parameterisation that do not change the function are called symmetries. The \textit{third paper} theoretically studies and classifies all the symmetries of 2-layer networks using the ReLU activation. Finally, the \textit{fourth paper} studies the effect of network parameterisation on network training. We provide a theoretical analysis of the effect that scaling layers have on the gradient updates. This provides a motivation for us to propose a Cooling method, which automatically scales the network parameters during training. Cooling reduces the reliance of the network on specific tricks, in particular the use of a learning rate schedule
Revisiting Tropical Polynomial Division: Theory, Algorithms and Application to Neural Networks
Tropical geometry has recently found several applications in the analysis of
neural networks with piecewise linear activation functions. This paper presents
a new look at the problem of tropical polynomial division and its application
to the simplification of neural networks. We analyze tropical polynomials with
real coefficients, extending earlier ideas and methods developed for
polynomials with integer coefficients. We first prove the existence of a unique
quotient-remainder pair and characterize the quotient in terms of the convex
bi-conjugate of a related function. Interestingly, the quotient of tropical
polynomials with integer coefficients does not necessarily have integer
coefficients. Furthermore, we develop a relationship of tropical polynomial
division with the computation of the convex hull of unions of convex polyhedra
and use it to derive an exact algorithm for tropical polynomial division. An
approximate algorithm is also presented, based on an alternation between data
partition and linear programming. We also develop special techniques to divide
composite polynomials, described as sums or maxima of simpler ones. Finally, we
present some numerical results to illustrate the efficiency of the algorithms
proposed, using the MNIST handwritten digit and CIFAR-10 datasets
On the Decision Boundaries of Neural Networks: A Tropical Geometry Perspective
This work tackles the problem of characterizing and understanding the
decision boundaries of neural networks with piecewise linear non-linearity
activations. We use tropical geometry, a new development in the area of
algebraic geometry, to characterize the decision boundaries of a simple network
of the form (Affine, ReLU, Affine). Our main finding is that the decision
boundaries are a subset of a tropical hypersurface, which is intimately related
to a polytope formed by the convex hull of two zonotopes. The generators of
these zonotopes are functions of the network parameters. This geometric
characterization provides new perspectives to three tasks. (i) We propose a new
tropical perspective to the lottery ticket hypothesis, where we view the effect
of different initializations on the tropical geometric representation of a
network's decision boundaries. (ii) Moreover, we propose new tropical based
optimization reformulations that directly influence the decision boundaries of
the network for the task of network pruning. (iii) At last, we discuss the
reformulation of the generation of adversarial attacks in a tropical sense. We
demonstrate that one can construct adversaries in a new tropical setting by
perturbing a specific set of decision boundaries by perturbing a set of
parameters in the network.Comment: First two authors contributed equally to this wor
- …