10,620 research outputs found
Importance mixing: Improving sample reuse in evolutionary policy search methods
Deep neuroevolution, that is evolutionary policy search methods based on deep
neural networks, have recently emerged as a competitor to deep reinforcement
learning algorithms due to their better parallelization capabilities. However,
these methods still suffer from a far worse sample efficiency. In this paper we
investigate whether a mechanism known as "importance mixing" can significantly
improve their sample efficiency. We provide a didactic presentation of
importance mixing and we explain how it can be extended to reuse more samples.
Then, from an empirical comparison based on a simple benchmark, we show that,
though it actually provides better sample efficiency, it is still far from the
sample efficiency of deep reinforcement learning, though it is more stable
Efficient Optimization of Loops and Limits with Randomized Telescoping Sums
We consider optimization problems in which the objective requires an inner
loop with many steps or is the limit of a sequence of increasingly costly
approximations. Meta-learning, training recurrent neural networks, and
optimization of the solutions to differential equations are all examples of
optimization problems with this character. In such problems, it can be
expensive to compute the objective function value and its gradient, but
truncating the loop or using less accurate approximations can induce biases
that damage the overall solution. We propose randomized telescope (RT) gradient
estimators, which represent the objective as the sum of a telescoping series
and sample linear combinations of terms to provide cheap unbiased gradient
estimates. We identify conditions under which RT estimators achieve
optimization convergence rates independent of the length of the loop or the
required accuracy of the approximation. We also derive a method for tuning RT
estimators online to maximize a lower bound on the expected decrease in loss
per unit of computation. We evaluate our adaptive RT estimators on a range of
applications including meta-optimization of learning rates, variational
inference of ODE parameters, and training an LSTM to model long sequences
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
Trust-Region Variational Inference with Gaussian Mixture Models
Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-theart MCMC samplers while requiring up to three orders of magnitude less computational resources
Lagrangian Based Methods for Coherent Structure Detection
There has been a proliferation in the development of Lagrangian analytical methods for detecting coherent structures in fluid flow transport, yielding a variety of qualitatively different approaches. We present a review of four approaches and demonstrate the utility of these methods via their application to the same sample analytic model, the canonical double-gyre flow, highlighting the pros and cons of each approach. Two of the methods, the geometric and probabilistic approaches, are well established and require velocity field data over the time interval of interest to identify particularly important material lines and surfaces, and influential regions, respectively. The other two approaches, implementing tools from cluster and braid theory, seek coherent structures based on limited trajectory data, attempting to partition the flow transport into distinct regions. All four of these approaches share the common trait that they are objective methods, meaning that their results do not depend on the frame of reference used. For each method, we also present a number of example applications ranging from blood flow and chemical reactions to ocean and atmospheric flows. (C) 2015 AIP Publishing LLC.ONR N000141210665Center for Nonlinear Dynamic
- …