623 research outputs found
SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Although Reinforcement Learning (RL) is effective for sequential
decision-making problems under uncertainty, it still fails to thrive in
real-world systems where risk or safety is a binding constraint. In this paper,
we formulate the RL problem with safety constraints as a non-zero-sum game.
While deployed with maximum entropy RL, this formulation leads to a safe
adversarially guided soft actor-critic framework, called SAAC. In SAAC, the
adversary aims to break the safety constraint while the RL agent aims to
maximize the constrained value function given the adversary's policy. The
safety constraint on the agent's value function manifests only as a repulsion
term between the agent's and the adversary's policies. Unlike previous
approaches, SAAC can address different safety criteria such as safe
exploration, mean-variance risk sensitivity, and CVaR-like coherent risk
sensitivity. We illustrate the design of the adversary for these constraints.
Then, in each of these variations, we show the agent differentiates itself from
the adversary's unsafe actions in addition to learning to solve the task.
Finally, for challenging continuous control tasks, we demonstrate that SAAC
achieves faster convergence, better efficiency, and fewer failures to satisfy
the safety constraints than risk-averse distributional RL and risk-neutral soft
actor-critic algorithms
QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning
We propose a novel reinforcement learning algorithm,QD-RL, that incorporates
the strengths of off-policy RL algorithms into Quality Diversity (QD)
approaches. Quality-Diversity methods contribute structural biases by
decoupling the search for diversity from the search for high return, resulting
in efficient management of the exploration-exploitation trade-off. However,
these approaches generally suffer from sample inefficiency as they call upon
evolutionary techniques. QD-RL removes this limitation by relying on off-policy
RL algorithms. More precisely, we train a population of off-policy deep RL
agents to simultaneously maximize diversity inside the population and the
return of the agents. QD-RL selects agents from the diversity-return Pareto
Front, resulting in stable and efficient population updates. Our experiments on
the Ant-Maze environment show that QD-RL can solve challenging exploration and
control problems with deceptive rewards while being more than 15 times more
sample efficient than its evolutionary counterparts
Efficient RL via Disentangled Environment and Agent Representations
Agents that are aware of the separation between themselves and their
environments can leverage this understanding to form effective representations
of visual input. We propose an approach for learning such structured
representations for RL algorithms, using visual knowledge of the agent, such as
its shape or mask, which is often inexpensive to obtain. This is incorporated
into the RL objective using a simple auxiliary loss. We show that our method,
Structured Environment-Agent Representations, outperforms state-of-the-art
model-free approaches over 18 different challenging visual simulation
environments spanning 5 different robots. Website at https://sear-rl.github.io/Comment: ICML 2023. Website at https://sear-rl.github.io
Ecological active vision: four bio-inspired principles to integrate bottom-up and adaptive top-down attention tested with a simple camera-arm robot
Vision gives primates a wealth of information useful to manipulate the environment, but at the same time it can easily overwhelm their computational resources. Active vision is a key solution found by nature to solve this problem: a limited fovea actively displaced in space to collect only relevant information. Here we highlight that in ecological conditions this solution encounters four problems: 1) the agent needs to learn where to look based on its goals; 2) manipulation causes learning feedback in areas of space possibly outside the attention focus; 3) good visual actions are needed to guide manipulation actions, but only these can generate learning feedback; and 4) a limited fovea causes aliasing problems. We then propose a computational architecture ("BITPIC") to overcome the four problems, integrating four bioinspired key ingredients: 1) reinforcement-learning fovea-based top-down attention; 2) a strong vision-manipulation coupling; 3) bottom-up periphery-based attention; and 4) a novel action-oriented memory. The system is tested with a simple simulated camera-arm robot solving a class of search-and-reach tasks involving color-blob "objects." The results show that the architecture solves the problems, and hence the tasks, very ef?ciently, and highlight how the architecture principles can contribute to a full exploitation of the advantages of active vision in ecological conditions
Adversarially Guided Actor-Critic
International audienceDespite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper introduces a third protagonist: the adversary. While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions. This novel objective stimulates the actor to follow strategies that could not have been correctly predicted from previous trajectories, making its behavior innovative in tasks where the reward is extremely rare. Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks
- …