Search CORE

1,892 research outputs found

Automatic Curriculum Learning For Deep RL: A Short Survey

Author: Colas Cédric
Hofmann Katja
Oudeyer Pierre-Yves
Portelas Rémy
Weng Lilian
Publication venue
Publication date: 28/05/2020
Field of study

Automatic Curriculum Learning (ACL) has become a cornerstone of recent successes in Deep Reinforcement Learning (DRL).These methods shape the learning trajectories of agents by challenging them with tasks adapted to their capacities. In recent years, they have been used to improve sample efficiency and asymptotic performance, to organize exploration, to encourage generalization or to solve sparse reward problems, among others. The ambition of this work is dual: 1) to present a compact and accessible introduction to the Automatic Curriculum Learning literature and 2) to draw a bigger picture of the current state of the art in ACL to encourage the cross-breeding of existing concepts and the emergence of new ideas.Comment: Accepted at IJCAI202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

VPE: Variational Policy Embedding for Transfer Reinforcement Learning

Author: Arnekvist Isac
Kragic Danica
Stork Johannes A.
Publication venue
Publication date: 14/09/2018
Field of study

Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffers from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments. We consider this as a problem of transferring knowledge within a family of similar Markov decision processes. For this purpose we assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative mapping and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Artificial Intelligence and Systems Theory: Applied to Cooperative Robots

Author: Custodio Luis M. M.
Lima Pedro U.
Publication venue
Publication date: 01/09/2004
Field of study

This paper describes an approach to the design of a population of cooperative robots based on concepts borrowed from Systems Theory and Artificial Intelligence. The research has been developed under the SocRob project, carried out by the Intelligent Systems Laboratory at the Institute for Systems and Robotics - Instituto Superior Tecnico (ISR/IST) in Lisbon. The acronym of the project stands both for "Society of Robots" and "Soccer Robots", the case study where we are testing our population of robots. Designing soccer robots is a very challenging problem, where the robots must act not only to shoot a ball towards the goal, but also to detect and avoid static (walls, stopped robots) and dynamic (moving robots) obstacles. Furthermore, they must cooperate to defeat an opposing team. Our past and current research in soccer robotics includes cooperative sensor fusion for world modeling, object recognition and tracking, robot navigation, multi-robot distributed task planning and coordination, including cooperative reinforcement learning in cooperative and adversarial environments, and behavior-based architectures for real time task execution of cooperating robot teams

arXiv.org e-Print Archive

Directory of Open Access Journals

InfoSwarms: Drone Swarms and Information Warfare

Author: Kallenborn Zachary
Publication venue: USAWC Press
Publication date: 18/05/2022
Field of study

Drone swarms, which can be used at sea, on land, in the air, and even in space, are fundamentally information-dependent weapons. No study to date has examined drone swarms in the context of information warfare writ large. This article explores the dependence of these swarms on information and the resultant connections with areas of information warfare—electronic, cyber, space, and psychological—drawing on open-source research and qualitative reasoning. Overall, the article offers insights into how this important emerging technology fits into the broader defense ecosystem and outlines practical approaches to strengthening related information warfare capabilities

US Army War College Press (USAWC)

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Author: Amin Kareem
Choi Jaedeug
Czarnecki Wojciech M.
Eysenbach Benjamin
Finn Chelsea
Fu Justin
Gombolay Matthew
Haarnoja Tuomas
Hausman Karol
Henderson Peter
Ho Jonathan
Levine Sergey
Li Yunzhu
Ng Andrew Y
Nikolaidis Stefanos
Raghavan Hema
Ramachandran Deepak
Ross Stephane
Schafer J Ben
Teh Yee
Xu Kelvin
Zhang Yunbo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/11/2020
Field of study

Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert's demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans' strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy's objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task.Comment: In Proceedings of the 2020 ACM/IEEE In-ternational Conference on Human-Robot Interaction (HRI '20), March 23 to 26, 2020, Cambridge, United Kingdom.ACM, New York, NY, USA, 10 page

arXiv.org e-Print Archive

Crossref