8 research outputs found
Efficient Optimization of Loops and Limits with Randomized Telescoping Sums
We consider optimization problems in which the objective requires an inner
loop with many steps or is the limit of a sequence of increasingly costly
approximations. Meta-learning, training recurrent neural networks, and
optimization of the solutions to differential equations are all examples of
optimization problems with this character. In such problems, it can be
expensive to compute the objective function value and its gradient, but
truncating the loop or using less accurate approximations can induce biases
that damage the overall solution. We propose randomized telescope (RT) gradient
estimators, which represent the objective as the sum of a telescoping series
and sample linear combinations of terms to provide cheap unbiased gradient
estimates. We identify conditions under which RT estimators achieve
optimization convergence rates independent of the length of the loop or the
required accuracy of the approximation. We also derive a method for tuning RT
estimators online to maximize a lower bound on the expected decrease in loss
per unit of computation. We evaluate our adaptive RT estimators on a range of
applications including meta-optimization of learning rates, variational
inference of ODE parameters, and training an LSTM to model long sequences
Multi-fidelity Monte Carlo: a pseudo-marginal approach
Markov chain Monte Carlo (MCMC) is an established approach for uncertainty
quantification and propagation in scientific applications. A key challenge in
applying MCMC to scientific domains is computation: the target density of
interest is often a function of expensive computations, such as a high-fidelity
physical simulation, an intractable integral, or a slowly-converging iterative
algorithm. Thus, using an MCMC algorithms with an expensive target density
becomes impractical, as these expensive computations need to be evaluated at
each iteration of the algorithm. In practice, these computations often
approximated via a cheaper, low-fidelity computation, leading to bias in the
resulting target density. Multi-fidelity MCMC algorithms combine models of
varying fidelities in order to obtain an approximate target density with lower
computational cost. In this paper, we describe a class of asymptotically exact
multi-fidelity MCMC algorithms for the setting where a sequence of models of
increasing fidelity can be computed that approximates the expensive target
density of interest. We take a pseudo-marginal MCMC approach for multi-fidelity
inference that utilizes a cheaper, randomized-fidelity unbiased estimator of
the target fidelity constructed via random truncation of a telescoping series
of the low-fidelity sequence of models. Finally, we discuss and evaluate the
proposed multi-fidelity MCMC approach on several applications, including
log-Gaussian Cox process modeling, Bayesian ODE system identification,
PDE-constrained optimization, and Gaussian process regression parameter
inference.Comment: 22 pages, 7 figure