Bilevel optimization problems, which are problems where two optimization
problems are nested, have more and more applications in machine learning. In
many practical cases, the upper and the lower objectives correspond to
empirical risk minimization problems and therefore have a sum structure. In
this context, we propose a bilevel extension of the celebrated SARAH algorithm.
We demonstrate that the algorithm requires
O((n+m)21ε−1) gradient computations to achieve
ε-stationarity with n+m the total number of samples, which
improves over all previous bilevel algorithms. Moreover, we provide a lower
bound on the number of oracle calls required to get an approximate stationary
point of the objective function of the bilevel problem. This lower bound is
attained by our algorithm, which is therefore optimal in terms of sample
complexity