In many applications, a combinatorial problem must be repeatedly solved with
similar, but distinct parameters. Yet, the parameters w are not directly
observed; only contextual data d that correlates with w is available. It is
tempting to use a neural network to predict w given d, but training such a
model requires reconciling the discrete nature of combinatorial optimization
with the gradient-based frameworks used to train neural networks. When the
problem in question is an Integer Linear Program (ILP), one approach to
overcoming this issue is to consider a continuous relaxation of the
combinatorial problem. While existing methods utilizing this approach have
shown to be highly effective on small problems (10-100 variables), they do not
scale well to large problems. In this work, we draw on ideas from modern convex
optimization to design a network and training scheme which scales effortlessly
to problems with thousands of variables