26 research outputs found
On Training Neural Networks with Mixed Integer Programming
Recent work has shown potential in using Mixed Integer Programming (MIP)
solvers to optimize certain aspects of neural networks (NN). However little
research has gone into training NNs with solvers. State of the art methods to
train NNs are typically gradient-based and require significant data,
computation on GPUs and extensive hyper-parameter tuning. In contrast, training
with MIP solvers should not require GPUs or hyper-parameter tuning but can
likely not handle large amounts of data. This work builds on recent advances
that train binarized NNs using MIP solvers. We go beyond current work by
formulating new MIP models to increase the amount of data that can be used and
to train non-binary integer-valued networks. Our results show that comparable
results to using gradient descent can be achieved when minimal data is
available
A reformulation to Embedding a Neural Network in a linear program without integer variables
In this technical report, a new formulation for embedding a neural network
into an optimization model is described. This formulation does not require
binary variables to properly compute the output of the neural network for
specific types of problems. Preliminary experiments show that this
reformulation resulted in faster computation times when solving a proposed
showcase model, in which non-linearity is necessary to be computed. This is in
comparison with the classic formulation and off-the-shelf tools of commercial
solvers
The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification
We improve the effectiveness of propagation- and linear-optimization-based
neural network verification algorithms with a new tightened convex relaxation
for ReLU neurons. Unlike previous single-neuron relaxations which focus only on
the univariate input space of the ReLU, our method considers the multivariate
input space of the affine pre-activation function preceding the ReLU. Using
results from submodularity and convex geometry, we derive an explicit
description of the tightest possible convex relaxation when this multivariate
input is over a box domain. We show that our convex relaxation is significantly
stronger than the commonly used univariate-input relaxation which has been
proposed as a natural convex relaxation barrier for verification. While our
description of the relaxation may require an exponential number of
inequalities, we show that they can be separated in linear time and hence can
be efficiently incorporated into optimization algorithms on an as-needed basis.
Based on this novel relaxation, we design two polynomial-time algorithms for
neural network verification: a linear-programming-based algorithm that
leverages the full power of our relaxation, and a fast propagation algorithm
that generalizes existing approaches. In both cases, we show that for a modest
increase in computational effort, our strengthened relaxation enables us to
verify a significantly larger number of instances compared to similar
algorithms
Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming
We introduce a mixed integer program (MIP) for assigning importance scores to
each neuron in deep neural network architectures which is guided by the impact
of their simultaneous pruning on the main learning task of the network. By
carefully devising the objective function of the MIP, we drive the solver to
minimize the number of critical neurons (i.e., with high importance score) that
need to be kept for maintaining the overall accuracy of the trained neural
network. Further, the proposed formulation generalizes the recently considered
lottery ticket optimization by identifying multiple "lucky" sub-networks
resulting in optimized architecture that not only performs well on a single
dataset, but also generalizes across multiple ones upon retraining of network
weights. Finally, we present a scalable implementation of our method by
decoupling the importance scores across layers using auxiliary networks. We
demonstrate the ability of our formulation to prune neural networks with
marginal loss in accuracy and generalizability on popular datasets and
architectures.Comment: 16 pages, 3 figures, 5 tables, under revie