48,488 research outputs found
Momentum in Reinforcement Learning
We adapt the optimization's concept of momentum to reinforcement learning.
Seeing the state-action value functions as an analog to the gradients in
optimization, we interpret momentum as an average of consecutive -functions.
We derive Momentum Value Iteration (MoVI), a variation of Value Iteration that
incorporates this momentum idea. Our analysis shows that this allows MoVI to
average errors over successive iterations. We show that the proposed approach
can be readily extended to deep learning. Specifically, we propose a simple
improvement on DQN based on MoVI, and experiment it on Atari games.Comment: AISTATS 202
Momentum in Reinforcement Learning
International audienceWe adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive q-functions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically,we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games
Efficient collective swimming by harnessing vortices through deep reinforcement learning
Fish in schooling formations navigate complex flow-fields replete with
mechanical energy in the vortex wakes of their companions. Their schooling
behaviour has been associated with evolutionary advantages including collective
energy savings. How fish harvest energy from their complex fluid environment
and the underlying physical mechanisms governing energy-extraction during
collective swimming, is still unknown. Here we show that fish can improve their
sustained propulsive efficiency by actively following, and judiciously
intercepting, vortices in the wake of other swimmers. This swimming strategy
leads to collective energy-savings and is revealed through the first ever
combination of deep reinforcement learning with high-fidelity flow simulations.
We find that a `smart-swimmer' can adapt its position and body deformation to
synchronise with the momentum of the oncoming vortices, improving its average
swimming-efficiency at no cost to the leader. The results show that fish may
harvest energy deposited in vortices produced by their peers, and support the
conjecture that swimming in formation is energetically advantageous. Moreover,
this study demonstrates that deep reinforcement learning can produce navigation
algorithms for complex flow-fields, with promising implications for energy
savings in autonomous robotic swarms.Comment: 26 pages, 14 figure
Prospects of reinforcement learning for the simultaneous damping of many mechanical modes
We apply adaptive feedback for the partial refrigeration of a mechanical
resonator, i.e. with the aim to simultaneously cool the classical thermal
motion of more than one vibrational degree of freedom. The feedback is obtained
from a neural network parametrized policy trained via a reinforcement learning
strategy to choose the correct sequence of actions from a finite set in order
to simultaneously reduce the energy of many modes of vibration. The actions are
realized either as optical modulations of the spring constants in the so-called
quadratic optomechanical coupling regime or as radiation pressure induced
momentum kicks in the linear coupling regime. As a proof of principle we
numerically illustrate efficient simultaneous cooling of four independent modes
with an overall strong reduction of the total system temperature.Comment: Machine learning in Optomechanics: coolin
CopyCAT: Taking Control of Neural Policies with Constant Attacks
We propose a new perspective on adversarial attacks against deep
reinforcement learning agents. Our main contribution is CopyCAT, a targeted
attack able to consistently lure an agent into following an outsider's policy.
It is pre-computed, therefore fast inferred, and could thus be usable in a
real-time scenario. We show its effectiveness on Atari 2600 games in the novel
read-only setting. In this setting, the adversary cannot directly modify the
agent's state -- its representation of the environment -- but can only attack
the agent's observation -- its perception of the environment. Directly
modifying the agent's state would require a write-access to the agent's inner
workings and we argue that this assumption is too strong in realistic settings.Comment: AAMAS 202
A comparison of eligibility trace and momentum on SARSA in continuous state- and action-space
Here the Newton’s Method direct action selection approach to continuous action-space reinforcement learning is extended to use an eligibility trace. This is then compared to the momentum term approach from the literature in terms of the update equations and also the success rate and number of trials required to train on two variants of the simulated Cart-Pole benchmark problem. The eligibility trace approach achieves a higher success rate with a far wider range of parameter values than the momentum approach and also trains in fewer trials on the Cart-Pole problem
A comparison of eligibility trace and momentum on SARSA in continuous state- and action-space
Here the Newton’s Method direct action selection approach to continuous action-space reinforcement learning is extended to use an eligibility trace. This is then compared to the momentum term approach from the literature in terms of the update equations and also the success rate and number of trials required to train on two variants of the simulated Cart-Pole benchmark problem. The eligibility trace approach achieves a higher success rate with a far wider range of parameter values than the momentum approach and also trains in fewer trials on the Cart-Pole problem
- …