1,428 research outputs found
Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions
This paper presents a data-driven approach for multi-robot coordination in
partially-observable domains based on Decentralized Partially Observable Markov
Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a
general framework for cooperative sequential decision making under uncertainty
and MAs allow temporally extended and asynchronous action execution. To date,
most methods assume the underlying Dec-POMDP model is known a priori or a full
simulator is available during planning time. Previous methods which aim to
address these issues suffer from local optimality and sensitivity to initial
conditions. Additionally, few hardware demonstrations involving a large team of
heterogeneous robots and with long planning horizons exist. This work addresses
these gaps by proposing an iterative sampling based Expectation-Maximization
algorithm (iSEM) to learn polices using only trajectory data containing
observations, MAs, and rewards. Our experiments show the algorithm is able to
achieve better solution quality than the state-of-the-art learning-based
methods. We implement two variants of multi-robot Search and Rescue (SAR)
domains (with and without obstacles) on hardware to demonstrate the learned
policies can effectively control a team of distributed robots to cooperate in a
partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017
Formal Modelling for Multi-Robot Systems Under Uncertainty
Purpose of Review: To effectively synthesise and analyse multi-robot
behaviour, we require formal task-level models which accurately capture
multi-robot execution. In this paper, we review modelling formalisms for
multi-robot systems under uncertainty, and discuss how they can be used for
planning, reinforcement learning, model checking, and simulation.
Recent Findings: Recent work has investigated models which more accurately
capture multi-robot execution by considering different forms of uncertainty,
such as temporal uncertainty and partial observability, and modelling the
effects of robot interactions on action execution. Other strands of work have
presented approaches for reducing the size of multi-robot models to admit more
efficient solution methods. This can be achieved by decoupling the robots under
independence assumptions, or reasoning over higher level macro actions.
Summary: Existing multi-robot models demonstrate a trade off between
accurately capturing robot dependencies and uncertainty, and being small enough
to tractably solve real world problems. Therefore, future research should
exploit realistic assumptions over multi-robot behaviour to develop smaller
models which retain accurate representations of uncertainty and robot
interactions; and exploit the structure of multi-robot problems, such as
factored state spaces, to develop scalable solution methods.Comment: 23 pages, 0 figures, 2 tables. Current Robotics Reports (2023). This
version of the article has been accepted for publication, after peer review
(when applicable) but is not the Version of Record and does not reflect
post-acceptance improvements, or any corrections. The Version of Record is
available online at: https://dx.doi.org/10.1007/s43154-023-00104-
Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Today's automated vehicles lack the ability to cooperate implicitly with
others. This work presents a Monte Carlo Tree Search (MCTS) based approach for
decentralized cooperative planning using macro-actions for automated vehicles
in heterogeneous environments. Based on cooperative modeling of other agents
and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the
state-action-values of each agent in a cooperative and decentralized manner,
explicitly modeling the interdependence of actions between traffic
participants. Macro-actions allow for temporal extension over multiple time
steps and increase the effective search depth requiring fewer iterations to
plan over longer horizons. Without predefined policies for macro-actions, the
algorithm simultaneously learns policies over and within macro-actions. The
proposed method is evaluated under several conflict scenarios, showing that the
algorithm can achieve effective cooperative planning with learned macro-actions
in heterogeneous environments
Overview of deep reinforcement learning in partially observable multi-agent environment of competitive online video games
In the late 2010’s classical games of Go, Chess and Shogi have been considered ’solved’ by deep
reinforcement learning AI agents. Competitive online video games may offer a new, more challenging environment for deep reinforcement learning and serve as a stepping stone in a path to real
world applications. This thesis aims to give a short introduction to the concepts of reinforcement
learning, deep networks and deep reinforcement learning. Then the thesis proceeds to look into few
popular competitive online video games and to the general problems of AI development in these
types of games. Deep reinforcement learning algorithms, techniques and architectures used in the
development of highly competitive AI agents in Starcraft 2, Dota 2 and Quake 3 are overviewed.
Finally, the results are looked into and discussed
Adaptive and learning-based formation control of swarm robots
Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation
Counterfactual Multi-Agent Policy Gradients
Cooperative multi-agent systems can be naturally used to model many real
world problems, such as network packet routing and the coordination of
autonomous vehicles. There is a great need for new reinforcement learning
methods that can efficiently learn decentralised policies for such systems. To
this end, we propose a new multi-agent actor-critic method called
counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
critic to estimate the Q-function and decentralised actors to optimise the
agents' policies. In addition, to address the challenges of multi-agent credit
assignment, it uses a counterfactual baseline that marginalises out a single
agent's action, while keeping the other agents' actions fixed. COMA also uses a
critic representation that allows the counterfactual baseline to be computed
efficiently in a single forward pass. We evaluate COMA in the testbed of
StarCraft unit micromanagement, using a decentralised variant with significant
partial observability. COMA significantly improves average performance over
other multi-agent actor-critic methods in this setting, and the best performing
agents are competitive with state-of-the-art centralised controllers that get
access to the full state
- …