5,798 research outputs found
ConTaCT: Deciding to Communicate during Time-Critical Collaborative Tasks in Unknown, Deterministic Domains
Communication between agents has the potential to improve team performance of collaborative tasks. However, communication is not free in most domains, requiring agents to reason about the costs and benefits of sharing information. In this work, we develop an online, decentralized communication policy, ConTaCT, that enables agents to decide whether or not to communicate during time-critical collaborative tasks in unknown, deterministic environments. Our approach is motivated by real-world applications, including the coordination of disaster response and search and rescue teams. These settings motivate a model structure that explicitly represents the world model as initially unknown but deterministic in nature, and that de-emphasizes uncertainty about action outcomes. Simulated experiments are conducted in which ConTaCT is compared to other multi-agent communication policies, and results indicate that ConTaCT achieves comparable task performance while substantially reducing communication overhead
Scaling reinforcement learning to the unconstrained multi-agent domain
Reinforcement learning is a machine learning technique designed to mimic the
way animals learn by receiving rewards and punishment. It is designed to train
intelligent agents when very little is known about the agent’s environment, and consequently
the agent’s designer is unable to hand-craft an appropriate policy. Using
reinforcement learning, the agent’s designer can merely give reward to the agent when
it does something right, and the algorithm will craft an appropriate policy automatically.
In many situations it is desirable to use this technique to train systems of agents
(for example, to train robots to play RoboCup soccer in a coordinated fashion). Unfortunately,
several significant computational issues occur when using this technique
to train systems of agents. This dissertation introduces a suite of techniques that
overcome many of these difficulties in various common situations.
First, we show how multi-agent reinforcement learning can be made more tractable
by forming coalitions out of the agents, and training each coalition separately. Coalitions
are formed by using information-theoretic techniques, and we find that by using
a coalition-based approach, the computational complexity of reinforcement-learning
can be made linear in the total system agent count. Next we look at ways to integrate
domain knowledge into the reinforcement learning process, and how this can signifi-cantly improve the policy quality in multi-agent situations. Specifically, we find that
integrating domain knowledge into a reinforcement learning process can overcome training data deficiencies and allow the learner to converge to acceptable solutions
when lack of training data would have prevented such convergence without domain
knowledge. We then show how to train policies over continuous action spaces, which
can reduce problem complexity for domains that require continuous action spaces
(analog controllers) by eliminating the need to finely discretize the action space. Finally,
we look at ways to perform reinforcement learning on modern GPUs and show
how by doing this we can tackle significantly larger problems. We find that by offloading
some of the RL computation to the GPU, we can achieve almost a 4.5 speedup
factor in the total training process
Recommended from our members
CLEAN learning to improve coordination and scalability in multiagent systems
Recent advances in multiagent learning have led to exciting new capabilities spanning fields as diverse as planetary exploration, air traffic control, military reconnaissance, and airport security. Such algorithms provide a tangible benefit over traditional control algorithms in that they allow fast responses, adapt to dynamic environments, and generally scale well. Unfortunately, because many existing multiagent learning methods are extensions of single agent approaches, they are inhibited by three key issues: i) they treat the actions of other agents as "environmental noise" in an attempt to simplify the problem complexity, ii) they are slow to converge in large systems as the joint action space grows exponentially in the number of agents, and iii) they frequently rely upon the presence of an accurate system model being readily available. This work addresses these three issues sequentially. First, we improve overall learning performance compared to existing state-of-the-art techniques in the field by embracing the exploration in learning rather than ignoring it or approximating it away. Within multiagent systems, exploration by individual agents significantly alters the dynamics of the environment in which all agents learn. To address this, we introduce the concept of "private" exploration, which enables each agent to present a stationary baseline policy to other agents in order to allow other agents in the system to learn more efficiently. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards which improve coordination and performance by utilizing the concept of private exploration in order to remove the negative impact of traditional "public" exploration strategies from learning in multiagent systems. Next, we leverage the fundamental properties of CLEAN rewards that enable private exploration to allow agents to explore multiple potential actions concurrently in a "batch mode" in order to significantly improve learning speed over the state-of-the-art. Finally, we improve the real-world applicability of the proposed techniques by reducing their requirements. Specifically, the CLEAN rewards developed require an accurate partial model (i.e., an accurate model of the system objective) of the system in order to be computed. Unfortunately, many real-world systems are too complex to be modeled or are not known in advance, so an accurate system model is not available a priori. We address this shortcoming by employing model-based reinforcement learning techniques to enable agents to construct their own approximate model of the system objective based upon their observations and use this approximate model to calculate their CLEAN rewards.Keywords: Multiagent Coordination, Multiagent Learning, UAV Communication Network, Fractionated Satellites, UAV Swarms, Distributed Control, Multiagent Scalability, Learning based control, Reward Shaping, Cubesats, Multiagent systems, Solar Power UAVs, Satellite Constellation
Recommended from our members
Improving School Improvement
PREFACEIn opening this volume, you might be thinking:Is another book on school improvement really needed?Clearly our answer is yes. Our analyses of prevailing school improvement legislation, planning, and literature indicates fundamental deficiencies, especially with respect to enhancing equity of opportunity and closing the achievement gap.Here is what our work uniquely brings to policy and planning tables:(1) An expanded framework for school improvement – We highlight that moving from a two- to a three-component policy and practice framework is essential for closing the opportunity and achievement gaps. (That is, expanding from focusing primarily on instruction and management/government concerns by establishing a third primary component to improve how schools address barriers to learning and teaching.)(2) An emphasis on integrating a deep understanding of motivation – We underscore that concerns about engagement, management of behavior, school climate, equity of opportunity, and student outcomes require an up-to-date grasp of motivation and especially intrinsic motivation.(3) Clarification of the nature and scope of personalized teaching – We define personalization as the process of matching learner motivation and capabilities and stress that it is the learner's perception that determines whether the match is a good one.(4) A reframing of remediation and special education – We formulate these processes as personalized special assistance that is applied in and out of classrooms and practiced in a sequential and hierarchical manner.(5) A prototype for transforming student and learning supports – We provide a framework for a unified, comprehensive, and equitable system designed to address barriers to learning and teaching and re-engage disconnected students and families.(6) A reworking of the leadership structure for whole school improvement --We outline how the operational infrastructure can and must be realigned in keeping with a three component school improvement framework.(7) A systemic approach to enhancing school-community collaboration – We delineate a leadership role for schools in outreaching to communities in order to work on shared concerns through a formal collaborative operational infrastructure that enables weaving together resources to advance the work.(8) An expanded framework for school accountability – We reframe school accountability to ensure a balanced approach that accounts for a shift to a three component school improvement policy.(9) Guidance for substantive, scalable, and sustainable systemic changes –We frame mechanisms and discuss lessons learned related to facilitating fundamental systemic changes and replicating and sustaining them across a district.The frameworks and practices presented are based on our many years of work in schools and from efforts to enhance school-community collaboration. We incorporate insights from various theories and the large body of relevant research and from lessons learned and shared by many school leaders and staff who strive everyday to do their best for children.Our emphasis on new directions in no way is meant to demean current efforts. We know that the demands placed on those working in schools go well beyond what anyone should be asked to do. Given the current working conditions in many schools, our intent is to help make the hard work generate better results. To this end, we highlight new directions and systemic pathways for improving school outcomes.Some of what we propose is difficult to accomplish. Hopefully, the fact that there are schools, districts, and state agencies already trailblazing the way will engender a sense of hope and encouragement to those committed to innovation.It will be obvious that our work owes much to many. We are especially grateful to those who are pioneering major systemic changes across the country. These leaders and so many in the field have generously offered their insights and wisdom. And, of course, we are indebted to hundreds of scholars whose research and writing is a shared treasure. As always, we take this opportunity to thank Perry Nelson and the host of graduate and undergraduate students at UCLA who contribute so much to our work each day, and to the many young people and their families who continue to teach us all.Respectfully submitted for your consideration,Howard Adelman & Linda Taylo
- …