Search CORE

623 research outputs found

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Author: Basu Debabrota
Flet-Berliac Yannis
Publication venue
Publication date: 20/04/2022
Field of study

Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Self-Organised Swarm Flocking with Deep Reinforcement Learning

Author: Arvin Farshad
Bezcioglu Mehmet
Lennox Barry
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/03/2021
Field of study

The University of Manchester - Institutional Repository

QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning

Author: Beguir Karim
Cideron Geoffrey
Perrin Nicolas
Pierrot Thomas
Sigaud Olivier
Publication venue
Publication date: 13/04/2021
Field of study

We propose a novel reinforcement learning algorithm,QD-RL, that incorporates the strengths of off-policy RL algorithms into Quality Diversity (QD) approaches. Quality-Diversity methods contribute structural biases by decoupling the search for diversity from the search for high return, resulting in efficient management of the exploration-exploitation trade-off. However, these approaches generally suffer from sample inefficiency as they call upon evolutionary techniques. QD-RL removes this limitation by relying on off-policy RL algorithms. More precisely, we train a population of off-policy deep RL agents to simultaneously maximize diversity inside the population and the return of the agents. QD-RL selects agents from the diversity-return Pareto Front, resulting in stable and efficient population updates. Our experiments on the Ant-Maze environment show that QD-RL can solve challenging exploration and control problems with deceptive rewards while being more than 15 times more sample efficient than its evolutionary counterparts

arXiv.org e-Print Archive

Efficient RL via Disentangled Environment and Agent Representations

Author: Bahl Shikhar
Gmelin Kevin
Mendonca Russell
Pathak Deepak
Publication venue
Publication date: 05/09/2023
Field of study

Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, such as its shape or mask, which is often inexpensive to obtain. This is incorporated into the RL objective using a simple auxiliary loss. We show that our method, Structured Environment-Agent Representations, outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots. Website at https://sear-rl.github.io/Comment: ICML 2023. Website at https://sear-rl.github.io

arXiv.org e-Print Archive

Ecological active vision: four bio-inspired principles to integrate bottom-up and adaptive top-down attention tested with a simple camera-arm robot

Author: Baldassarre Gianluca
Ognibene Dimitri
Publication venue: IEEE
Publication date: 01/01/2015
Field of study

Vision gives primates a wealth of information useful to manipulate the environment, but at the same time it can easily overwhelm their computational resources. Active vision is a key solution found by nature to solve this problem: a limited fovea actively displaced in space to collect only relevant information. Here we highlight that in ecological conditions this solution encounters four problems: 1) the agent needs to learn where to look based on its goals; 2) manipulation causes learning feedback in areas of space possibly outside the attention focus; 3) good visual actions are needed to guide manipulation actions, but only these can generate learning feedback; and 4) a limited fovea causes aliasing problems. We then propose a computational architecture ("BITPIC") to overcome the four problems, integrating four bioinspired key ingredients: 1) reinforcement-learning fovea-based top-down attention; 2) a strong vision-manipulation coupling; 3) bottom-up periphery-based attention; and 4) a novel action-oriented memory. The system is tested with a simple simulated camera-arm robot solving a class of search-and-reach tasks involving color-blob "objects." The results show that the architecture solves the problems, and hence the tasks, very ef?ciently, and highlight how the architecture principles can contribute to a full exploitation of the advantages of active vision in ecological conditions

PUblication MAnagement

Adversarially Guided Actor-Critic

Author: Ferret Johan
Flet-Berliac Yannis
Geist Matthieu
Pietquin Olivier
Preux Philippe
Publication venue: HAL CCSD
Publication date: 08/02/2021
Field of study

International audienceDespite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper introduces a third protagonist: the adversary. While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions. This novel objective stimulates the actor to follow strategies that could not have been correctly predicted from previous trajectories, making its behavior innovative in tasks where the reward is extremely rare. Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot