300 research outputs found
Recommended from our members
Providing Informative Feedback for Learning in Tightly Coupled Multiagent Domains
Autonomous agents that sense, decide, act, and coordinate effectively with each other are critical in many real-world domains such as autonomous driving, search and rescue missions, air traffic management, and underwater or deep space exploration. All such domains share a key difficulty: though high-level mission goals are clear to system designers, the agent behaviors that achieve those goals are not.
Thus, system designers aim to use adaptive approaches such as reinforcement learning (RL) or evolutionary algorithms (EA) to discover the ideal behaviors for the agents, and these behaviors are often implemented in computational policies (for example as artificial neural networks) that map sensory inputs to actions or values. But for such learning systems to be successful, they need to leverage a system feedback (based on the agents' collective performance) to revise and update the agents' policies for how the agents should interact with the environment.
Unfortunately, both RL and EA approaches struggle when the environmental feedback is sparse and/or uninformative, especially in multiagent domains where teasing out an agent’s contribution to the system is difficult. Reward shaping methods address some of this difficulty, but they also suffer when faced with tightly coupled multiagent domains where feedback depends on multiple agents taking the correct joint action at the appropriate time.
The contributions of this work is to introduce Reward-Shaped Curriculum Learning, Fitness Critics, and Bidirectional Fitness Critics to address the challenges of sparse feedback in tightly coupled multiagent domains.
Reward-Shaped Curriculum Learning trains agents on successively more complex scenarios, which enables agents to use reward shaping to discover the correct actions first and then coordinate for the complex tasks. The impact of this approach is "reduce the sparsity'' of the reward. Fitness Critics directly address the sparse feedback problem by replacing the system reward with a step-by-step performance metric that maps the step-wise observations and actions to meaningful evaluations that are able to identify desirable behaviors. The impact of this approach is to turn a sparse, policy-based reward into a dense, state-action-based reward that trains agents for specific behaviors. Bidirectional Fitness Critics extends Fitness Critics to provide more informative feedback by leveraging the temporal information about the reward and the relevance of that information to the task. The impact of this approach is to more accurately capture the agents' contribution to the desired behavior
Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph
The complexity of multiagent reinforcement learning (MARL) in multiagent
systems increases exponentially with respect to the agent number. This
scalability issue prevents MARL from being applied in large-scale multiagent
systems. However, one critical feature in MARL that is often neglected is that
the interactions between agents are quite sparse. Without exploiting this
sparsity structure, existing works aggregate information from all of the agents
and thus have a high sample complexity. To address this issue, we propose an
adaptive sparse attention mechanism by generalizing a sparsity-inducing
activation function. Then a sparse communication graph in MARL is learned by
graph neural networks based on this new attention mechanism. Through this
sparsity structure, the agents can communicate in an effective as well as
efficient way via only selectively attending to agents that matter the most and
thus the scale of the MARL problem is reduced with little optimality
compromised. Comparative results show that our algorithm can learn an
interpretable sparse structure and outperforms previous works by a significant
margin on applications involving a large-scale multiagent system
Aerospace Cyber-Physical Systems Education
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/106495/1/AIAA2013-4809.pd
Recommended from our members
Multi-Reward Learning and Sparse Rewards
Reinforcement learning has made impressive strides in solving problems in challenging domains, but problems are increasingly being described with sparse rewards. Sparse rewards directly reduce the rate at which useful feedback is provided to the learner and make it difficult to distinguish between what specific actions led to the reception of a reward. This greatly reduces the speed of learning or completely thwarts attempts at learning completely. Some combat the difficulty of learning under sparsity by using multi-reward schemes. These schemes utilize more rewards than just the true system evaluation by doing things like providing exploration incentives or abstracting away a hierarchy of policies, each with different rewards. There are also further techniques that do not rely on multiple rewards, such as reward shaping or transfer learning. A key insight is that these techniques mentioned are orthogonal: multi-reward schemes can receive further benefits by applying other techniques. This project explores various multi-reward strategies and alternative solutions to sparse rewards to find intelligent ways to combine these methods. We provide three specific examples combining intrinsic rewards and transfer learning, imitation learning and policy combination, and hierarchical reinforcement learning and reward shaping in ways that extend the current state-of-the-art. To demonstrate practical usage of these techniques, we describe the application of these techniques to a sparsely rewarded underwater manipulation problem
Coalition based approach for shop floor agility – a multiagent approach
Dissertation submitted for a PhD degree in Electrical Engineering, speciality of Robotics and Integrated Manufacturing from the Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaThis thesis addresses the problem of shop floor agility. In order to cope with the disturbances and uncertainties that characterise the current business scenarios faced by manufacturing companies, the
capability of their shop floors needs to be improved quickly, such that these shop floors may be adapted, changed or become easily modifiable (shop floor reengineering).
One of the critical elements in any shop floor reengineering process is the way the control/supervision architecture is changed or modified to accommodate for the new processes and equipment. This thesis,
therefore, proposes an architecture to support the fast adaptation or changes in the control/supervision architecture. This architecture postulates that manufacturing systems are no more than compositions of
modularised manufacturing components whose interactions when aggregated are governed by
contractual mechanisms that favour configuration over reprogramming.
A multiagent based reference architecture called Coalition Based Approach for Shop floor Agility – CoBASA, was created to support fast adaptation and changes of shop floor control architectures with minimal effort. The coalitions are composed of agentified manufacturing components (modules), whose relationships within the coalitions are governed by contracts that are configured whenever a coalition is established. Creating and changing a coalition do not involve programming effort because it only requires changes to the contract that regulates it
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
- …