81 research outputs found
Adaptive and learning-based formation control of swarm robots
Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation
Experience Sharing Between Cooperative Reinforcement Learning Agents
The idea of experience sharing between cooperative agents naturally emerges
from our understanding of how humans learn. Our evolution as a species is
tightly linked to the ability to exchange learned knowledge with one another.
It follows that experience sharing (ES) between autonomous and independent
agents could become the key to accelerate learning in cooperative multiagent
settings. We investigate if randomly selecting experiences to share can
increase the performance of deep reinforcement learning agents, and propose
three new methods for selecting experiences to accelerate the learning process.
Firstly, we introduce Focused ES, which prioritizes unexplored regions of the
state space. Secondly, we present Prioritized ES, in which temporal-difference
error is used as a measure of priority. Finally, we devise Focused Prioritized
ES, which combines both previous approaches. The methods are empirically
validated in a control problem. While sharing randomly selected experiences
between two Deep Q-Network agents shows no improvement over a single agent
baseline, we show that the proposed ES methods can successfully outperform the
baseline. In particular, the Focused ES accelerates learning by a factor of 2,
reducing by 51% the number of episodes required to complete the task.Comment: Published at the Proceedings of the 31st IEEE International
Conference on Tools with Artificial Intelligenc
Recommended from our members
Organisations as complex adaptive systems : implications for the design of information systems
Today a paradigm shift in the field of organisation and management theories is no longer disputed and the need to switch from the Command-and-Control to the Leaming Organisation Paradigm (LOP) in the area of organisational theory is well understood. However, it is less well appreciated that learning organisations cannot operate effectively if supported by centralised databases and tailor-made application programs. LOP emphasises adaptability, flexibility, participation and learning. It is important to understand that the changes in organisational and management strategies will not on their own be able to produce the desired effects unless they are supported by appropriate changes in organisational culture, and by effective information systems. This research demonstrates that conventional information system strategies and development methods are no longer adequate.
Information system strategies must respond to these needs of the LOP and incorporate new information systems that are capable of evolving, adapting and responding to the constantly changing business environment. The desired adaptability, flexibility and agility in information systems for LOP can be achieved by exploiting the technologies of the Internet, World Wide Web, intelligent agents and intranets. This research establishes that there is a need for synergy between organisational structures and organisational information systems. To obtain this desired synergy it is essential that new information systems be designed as an integral part of the learning organisational structure itself.
Complexity theory provides a new set of metaphors and a host of concepts for the understanding of organisations as complex adaptive systems. This research introduces the principles of Complex Adaptive Systems and draws on their significance for designing the information systems needed to support the new generation of learning organisations. The search for new models of information system strategies for today's dynamic world of business points to the 'swarm models' observed in Nature
Human-machine communication for educational systems design
This book contains the papers presented at the NATO Advanced Study Institute (ASI) on the Basics of man-machine communication for the design of educational systems, held August 16-26, 1993, in Eindhoven, The Netherland
Learning and Co-operation in Mobile Multi-Robot Systems
Merged with duplicate record 10026.1/1984 on 27.02.2017 by CS (TIS)This thesis addresses the problem of setting the balance between exploration and
exploitation in teams of learning robots who exchange information. Specifically it looks at
groups of robots whose tasks include moving between salient points in the environment.
To deal with unknown and dynamic environments,such robots need to be able to discover
and learn the routes between these points themselves. A natural extension of this scenario
is to allow the robots to exchange learned routes so that only one robot needs to learn a
route for the whole team to use that route. One contribution of this thesis is to identify a
dilemma created by this extension: that once one robot has learned a route between two
points, all other robots will follow that route without looking for shorter versions. This
trade-off will be labeled the Distributed Exploration vs. Exploitation Dilemma, since
increasing distributed exploitation (allowing robots to exchange more routes) means
decreasing distributed exploration (reducing robots ability to learn new versions of routes),
and vice-versa. At different times, teams may be required with different balances of
exploitation and exploration. The main contribution of this thesis is to present a system for
setting the balance between exploration and exploitation in a group of robots. This system
is demonstrated through experiments involving simulated robot teams. The experiments
show that increasing and decreasing the value of a parameter of the novel system will lead
to a significant increase and decrease respectively in average exploitation (and an
equivalent decrease and increase in average exploration) over a series of team missions. A
further set of experiments show that this holds true for a range of team sizes and numbers
of goals
- …