17,769 research outputs found

    Efficient Optimization of Loops and Limits with Randomized Telescoping Sums

    Full text link
    We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences

    Approximate IPA: Trading Unbiasedness for Simplicity

    Full text link
    When Perturbation Analysis (PA) yields unbiased sensitivity estimators for expected-value performance functions in discrete event dynamic systems, it can be used for performance optimization of those functions. However, when PA is known to be unbiased, the complexity of its estimators often does not scale with the system's size. The purpose of this paper is to suggest an alternative approach to optimization which balances precision with computing efforts by trading off complicated, unbiased PA estimators for simple, biased approximate estimators. Furthermore, we provide guidelines for developing such estimators, that are largely based on the Stochastic Flow Modeling framework. We suggest that if the relative error (or bias) is not too large, then optimization algorithms such as stochastic approximation converge to a (local) minimum just like in the case where no approximation is used. We apply this approach to an example of balancing loss with buffer-cost in a finite-buffer queue, and prove a crucial upper bound on the relative error. This paper presents the initial study of the proposed approach, and we believe that if the idea gains traction then it may lead to a significant expansion of the scope of PA in optimization of discrete event systems.Comment: 8 pages, 8 figure

    Reducing Reparameterization Gradient Variance

    Full text link
    Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to use more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample. This approximation has high correlation with the noisy gradient by construction, making it a useful control variate for variance reduction. We demonstrate our approach on non-conjugate multi-level hierarchical models and a Bayesian neural net where we observed gradient variance reductions of multiple orders of magnitude (20-2,000x)

    Black Box Variational Inference

    Full text link
    Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant model-specific analysis, and these efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult model-based derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data
    corecore