3,040 research outputs found
Deep Bilevel Learning
We present a novel regularization approach to train neural networks that
enjoys better generalization and test error than standard stochastic gradient
descent. Our approach is based on the principles of cross-validation, where a
validation set is used to limit the model overfitting. We formulate such
principles as a bilevel optimization problem. This formulation allows us to
define the optimization of a cost on the validation set subject to another
optimization on the training set. The overfitting is controlled by introducing
weights on each mini-batch in the training set and by choosing their values so
that they minimize the error on the validation set. In practice, these weights
define mini-batch learning rates in a gradient descent update equation that
favor gradients with better generalization capabilities. Because of its
simplicity, this approach can be integrated with other regularization methods
and training schemes. We evaluate extensively our proposed algorithm on several
neural network architectures and datasets, and find that it consistently
improves the generalization of the model, especially when labels are noisy.Comment: ECCV 201
A Machine Learning Approach to Solving Large Bilevel and Stochastic Programs: Application to Cycling Network Design
We present a novel machine learning-based approach to solving bilevel
programs that involve a large number of independent followers, which as a
special case include two-stage stochastic programming. We propose an
optimization model that explicitly considers a sampled subset of followers and
exploits a machine learning model to estimate the objective values of unsampled
followers. Unlike existing approaches, we embed machine learning model training
into the optimization problem, which allows us to employ general follower
features that can not be represented using leader decisions. We prove bounds on
the optimality gap of the generated leader decision as measured by the original
objective function that considers the full follower set. We then develop
follower sampling algorithms to tighten the bounds and a representation
learning approach to learn follower features, which can be used as inputs to
the embedded machine learning model. Using synthetic instances of a cycling
network design problem, we compare the computational performance of our
approach versus baseline methods. Our approach provides more accurate
predictions for follower objective values, and more importantly, generates
leader decisions of higher quality. Finally, we perform a real-world case study
on cycling infrastructure planning, where we apply our approach to solve a
network design problem with over one million followers. Our approach presents
favorable performance compared to the current cycling network expansion
practices
- …