Many problems in machine learning involve bilevel optimization (BLO),
including hyperparameter optimization, meta-learning, and dataset distillation.
Bilevel problems consist of two nested sub-problems, called the outer and inner
problems, respectively. In practice, often at least one of these sub-problems
is overparameterized. In this case, there are many ways to choose among optima
that achieve equivalent objective values. Inspired by recent studies of the
implicit bias induced by optimization algorithms in single-level optimization,
we investigate the implicit bias of gradient-based algorithms for bilevel
optimization. We delineate two standard BLO methods -- cold-start and
warm-start -- and show that the converged solution or long-run behavior depends
to a large degree on these and other algorithmic choices, such as the
hypergradient approximation. We also show that the inner solutions obtained by
warm-start BLO can encode a surprising amount of information about the outer
objective, even when the outer parameters are low-dimensional. We believe that
implicit bias deserves as central a role in the study of bilevel optimization
as it has attained in the study of single-level neural net optimization.Comment: ICML 202