305 research outputs found
Deep Bayesian Quadrature Policy Optimization
We study the problem of obtaining accurate policy gradient estimates using a
finite number of samples. Monte-Carlo methods have been the default choice for
policy gradient estimation, despite suffering from high variance in the
gradient estimates. On the other hand, more sample efficient alternatives like
Bayesian quadrature methods have received little attention due to their high
computational complexity. In this work, we propose deep Bayesian quadrature
policy gradient (DBQPG), a computationally efficient high-dimensional
generalization of Bayesian quadrature, for policy gradient estimation. We show
that DBQPG can substitute Monte-Carlo estimation in policy gradient methods,
and demonstrate its effectiveness on a set of continuous control benchmarks. In
comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient
estimates with a significantly lower variance, (ii) a consistent improvement in
the sample complexity and average return for several deep policy gradient
algorithms, and, (iii) the uncertainty in gradient estimation that can be
incorporated to further improve the performance.Comment: Conference paper: AAAI-21. Code available at
https://github.com/Akella17/Deep-Bayesian-Quadrature-Policy-Optimizatio
Learning-based run-time power and energy management of multi/many-core systems: current and future trends
Multi/Many-core systems are prevalent in several application domains targeting different scales of computing such as embedded and cloud computing. These systems are able to fulfil the everincreasing performance requirements by exploiting their parallel processing capabilities. However, effective power/energy management is required during system operations due to several reasons such as to increase the operational time of battery operated systems, reduce the energy cost of datacenters, and improve thermal efficiency and reliability. This article provides an extensive survey of learning-based run-time power/energy management approaches. The survey includes a taxonomy of the learning-based approaches. These approaches perform design-time and/or run-time power/energy management by employing some learning principles such as reinforcement learning. The survey also highlights the trends followed by the learning-based run-time power management approaches, their upcoming trends and open research challenges
ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution
We consider the problem of efficient blackbox optimization over a large
hybrid search space, consisting of a mixture of a high dimensional continuous
space and a complex combinatorial space. Such examples arise commonly in
evolutionary computation, but also more recently, neuroevolution and
architecture search for Reinforcement Learning (RL) policies. Unfortunately
however, previous mutation-based approaches suffer in high dimensional
continuous spaces both theoretically and practically. We thus instead propose
ES-ENAS, a simple joint optimization procedure by combining Evolutionary
Strategies (ES) and combinatorial optimization techniques in a highly scalable
and intuitive way, inspired by the one-shot or supernet paradigm introduced in
Efficient Neural Architecture Search (ENAS). Through this relatively simple
marriage between two different lines of research, we are able to gain the best
of both worlds, and empirically demonstrate our approach by optimizing BBOB
functions over hybrid spaces as well as combinatorial neural network
architectures via edge pruning and quantization on popular RL benchmarks. Due
to the modularity of the algorithm, we also are able incorporate a wide variety
of popular techniques ranging from use of different continuous and
combinatorial optimizers, as well as constrained optimization.Comment: 22 pages. See
https://github.com/google-research/google-research/tree/master/es_enas for
associated cod
Evolutionary Reinforcement Learning: A Survey
Reinforcement learning (RL) is a machine learning approach that trains agents
to maximize cumulative rewards through interactions with environments. The
integration of RL with deep learning has recently resulted in impressive
achievements in a wide range of challenging tasks, including board games,
arcade games, and robot control. Despite these successes, there remain several
crucial challenges, including brittle convergence properties caused by
sensitive hyperparameters, difficulties in temporal credit assignment with long
time horizons and sparse rewards, a lack of diverse exploration, especially in
continuous search space scenarios, difficulties in credit assignment in
multi-agent reinforcement learning, and conflicting objectives for rewards.
Evolutionary computation (EC), which maintains a population of learning agents,
has demonstrated promising performance in addressing these limitations. This
article presents a comprehensive survey of state-of-the-art methods for
integrating EC into RL, referred to as evolutionary reinforcement learning
(EvoRL). We categorize EvoRL methods according to key research fields in RL,
including hyperparameter optimization, policy search, exploration, reward
shaping, meta-RL, and multi-objective RL. We then discuss future research
directions in terms of efficient methods, benchmarks, and scalable platforms.
This survey serves as a resource for researchers and practitioners interested
in the field of EvoRL, highlighting the important challenges and opportunities
for future research. With the help of this survey, researchers and
practitioners can develop more efficient methods and tailored benchmarks for
EvoRL, further advancing this promising cross-disciplinary research field
- …