Search CORE

48,488 research outputs found

Momentum in Reinforcement Learning

Author: Geist Matthieu
Pietquin Olivier
Scherrer Bruno
Vieillard Nino
Publication venue
Publication date: 31/03/2020
Field of study

We adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive

q

-functions. We derive Momentum Value Iteration (MoVI), a variation of Value Iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically, we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.Comment: AISTATS 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Momentum in Reinforcement Learning

Author: Geist Matthieu
Pietquin Olivier
Scherrer Bruno
Vieillard Nino
Publication venue: HAL CCSD
Publication date: 01/08/2020
Field of study

International audienceWe adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive q-functions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically,we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games

INRIA a CCSD electronic archive server

Efficient collective swimming by harnessing vortices through deep reinforcement learning

Author: Koumoutsakos Petros
Novati Guido
Verma Siddhartha
Publication venue
Publication date: 07/02/2018
Field of study

Fish in schooling formations navigate complex flow-fields replete with mechanical energy in the vortex wakes of their companions. Their schooling behaviour has been associated with evolutionary advantages including collective energy savings. How fish harvest energy from their complex fluid environment and the underlying physical mechanisms governing energy-extraction during collective swimming, is still unknown. Here we show that fish can improve their sustained propulsive efficiency by actively following, and judiciously intercepting, vortices in the wake of other swimmers. This swimming strategy leads to collective energy-savings and is revealed through the first ever combination of deep reinforcement learning with high-fidelity flow simulations. We find that a `smart-swimmer' can adapt its position and body deformation to synchronise with the momentum of the oncoming vortices, improving its average swimming-efficiency at no cost to the leader. The results show that fish may harvest energy deposited in vortices produced by their peers, and support the conjecture that swimming in formation is energetically advantageous. Moreover, this study demonstrates that deep reinforcement learning can produce navigation algorithms for complex flow-fields, with promising implications for energy savings in autonomous robotic swarms.Comment: 26 pages, 14 figure

arXiv.org e-Print Archive

Prospects of reinforcement learning for the simultaneous damping of many mechanical modes

Author: Asjad Muhammad
Genes Claudiu
Sommer Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/02/2020
Field of study

We apply adaptive feedback for the partial refrigeration of a mechanical resonator, i.e. with the aim to simultaneously cool the classical thermal motion of more than one vibrational degree of freedom. The feedback is obtained from a neural network parametrized policy trained via a reinforcement learning strategy to choose the correct sequence of actions from a finite set in order to simultaneously reduce the energy of many modes of vibration. The actions are realized either as optical modulations of the spring constants in the so-called quadratic optomechanical coupling regime or as radiation pressure induced momentum kicks in the linear coupling regime. As a proof of principle we numerically illustrate efficient simultaneous cooling of four independent modes with an overall strong reduction of the total system temperature.Comment: Machine learning in Optomechanics: coolin

arXiv.org e-Print Archive

MPG.PuRe

CopyCAT: Taking Control of Neural Policies with Constant Attacks

Author: Geist Matthieu
Hussenot Léonard
Pietquin Olivier
Publication venue
Publication date: 21/01/2020
Field of study

We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider's policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In this setting, the adversary cannot directly modify the agent's state -- its representation of the environment -- but can only attack the agent's observation -- its perception of the environment. Directly modifying the agent's state would require a write-access to the agent's inner workings and we argue that this assumption is too strong in realistic settings.Comment: AAMAS 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A comparison of eligibility trace and momentum on SARSA in continuous state- and action-space

Author: Nichols Barry D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/11/2017
Field of study

Here the Newton’s Method direct action selection approach to continuous action-space reinforcement learning is extended to use an eligibility trace. This is then compared to the momentum term approach from the literature in terms of the update equations and also the success rate and number of trials required to train on two variants of the simulated Cart-Pole benchmark problem. The eligibility trace approach achieves a higher success rate with a far wider range of parameter values than the momentum approach and also trains in fewer trials on the Cart-Pole problem

Crossref

Middlesex University Research Repository

A comparison of eligibility trace and momentum on SARSA in continuous state- and action-space

Author: Nichols B.
Nichols B.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

Middlesex University Research Repository