17,769 research outputs found
Efficient Optimization of Loops and Limits with Randomized Telescoping Sums
We consider optimization problems in which the objective requires an inner
loop with many steps or is the limit of a sequence of increasingly costly
approximations. Meta-learning, training recurrent neural networks, and
optimization of the solutions to differential equations are all examples of
optimization problems with this character. In such problems, it can be
expensive to compute the objective function value and its gradient, but
truncating the loop or using less accurate approximations can induce biases
that damage the overall solution. We propose randomized telescope (RT) gradient
estimators, which represent the objective as the sum of a telescoping series
and sample linear combinations of terms to provide cheap unbiased gradient
estimates. We identify conditions under which RT estimators achieve
optimization convergence rates independent of the length of the loop or the
required accuracy of the approximation. We also derive a method for tuning RT
estimators online to maximize a lower bound on the expected decrease in loss
per unit of computation. We evaluate our adaptive RT estimators on a range of
applications including meta-optimization of learning rates, variational
inference of ODE parameters, and training an LSTM to model long sequences
Approximate IPA: Trading Unbiasedness for Simplicity
When Perturbation Analysis (PA) yields unbiased sensitivity estimators for
expected-value performance functions in discrete event dynamic systems, it can
be used for performance optimization of those functions. However, when PA is
known to be unbiased, the complexity of its estimators often does not scale
with the system's size. The purpose of this paper is to suggest an alternative
approach to optimization which balances precision with computing efforts by
trading off complicated, unbiased PA estimators for simple, biased approximate
estimators. Furthermore, we provide guidelines for developing such estimators,
that are largely based on the Stochastic Flow Modeling framework. We suggest
that if the relative error (or bias) is not too large, then optimization
algorithms such as stochastic approximation converge to a (local) minimum just
like in the case where no approximation is used. We apply this approach to an
example of balancing loss with buffer-cost in a finite-buffer queue, and prove
a crucial upper bound on the relative error. This paper presents the initial
study of the proposed approach, and we believe that if the idea gains traction
then it may lead to a significant expansion of the scope of PA in optimization
of discrete event systems.Comment: 8 pages, 8 figure
Reducing Reparameterization Gradient Variance
Optimization with noisy gradients has become ubiquitous in statistics and
machine learning. Reparameterization gradients, or gradient estimates computed
via the "reparameterization trick," represent a class of noisy gradients often
used in Monte Carlo variational inference (MCVI). However, when these gradient
estimators are too noisy, the optimization procedure can be slow or fail to
converge. One way to reduce noise is to use more samples for the gradient
estimate, but this can be computationally expensive. Instead, we view the noisy
gradient as a random variable, and form an inexpensive approximation of the
generating procedure for the gradient sample. This approximation has high
correlation with the noisy gradient by construction, making it a useful control
variate for variance reduction. We demonstrate our approach on non-conjugate
multi-level hierarchical models and a Bayesian neural net where we observed
gradient variance reductions of multiple orders of magnitude (20-2,000x)
Black Box Variational Inference
Variational inference has become a widely used method to approximate
posteriors in complex latent variables models. However, deriving a variational
inference algorithm generally requires significant model-specific analysis, and
these efforts can hinder and deter us from quickly developing and exploring a
variety of models for a problem at hand. In this paper, we present a "black
box" variational inference algorithm, one that can be quickly applied to many
models with little additional derivation. Our method is based on a stochastic
optimization of the variational objective where the noisy gradient is computed
from Monte Carlo samples from the variational distribution. We develop a number
of methods to reduce the variance of the gradient, always maintaining the
criterion that we want to avoid difficult model-based derivations. We evaluate
our method against the corresponding black box sampling based methods. We find
that our method reaches better predictive likelihoods much faster than sampling
methods. Finally, we demonstrate that Black Box Variational Inference lets us
easily explore a wide space of models by quickly constructing and evaluating
several models of longitudinal healthcare data
- …