50 research outputs found
Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing
In this paper, we study and analyze zeroth-order stochastic approximation
algorithms for solving bilvel problems, when neither the upper/lower objective
values, nor their unbiased gradient estimates are available. In particular,
exploiting Stein's identity, we first use Gaussian smoothing to estimate first-
and second-order partial derivatives of functions with two independent block of
variables. We then used these estimates in the framework of a stochastic
approximation algorithm for solving bilevel optimization problems and establish
its non-asymptotic convergence analysis. To the best of our knowledge, this is
the first time that sample complexity bounds are established for a fully
stochastic zeroth-order bilevel optimization algorithm
Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning
We develop and analyze stochastic approximation algorithms for solving nested
compositional bi-level optimization problems. These problems involve a nested
composition of potentially non-convex smooth functions in the upper-level,
and a smooth and strongly convex function in the lower-level. Our proposed
algorithm does not rely on matrix inversions or mini-batches and can achieve an
-stationary solution with an oracle complexity of approximately
, assuming the availability of stochastic
first-order oracles for the individual functions in the composition and the
lower-level, which are unbiased and have bounded moments. Here,
hides polylog factors and constants that depend on . The key challenge we
address in establishing this result relates to handling three distinct sources
of bias in the stochastic gradients. The first source arises from the
compositional nature of the upper-level, the second stems from the bi-level
structure, and the third emerges due to the utilization of Neumann series
approximations to avoid matrix inversion. To demonstrate the effectiveness of
our approach, we apply it to the problem of robust feature learning for deep
neural networks under covariate shift, showcasing the benefits and advantages
of our methodology in that context