Search CORE

98 research outputs found

Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

Author: Amato Christopher
Hoang Trong Nghia
How Jonathan
Sivakumar Kavinayan
Xiao Yuchen
Publication venue
Publication date: 17/10/2017
Field of study

A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Decision shaping and strategy learning in multi-robot interactions

Author: Valtazanos Aris
Publication venue: The University of Edinburgh
Publication date: 28/11/2013
Field of study

Recent developments in robot technology have contributed to the advancement of autonomous behaviours in human-robot systems; for example, in following instructions received from an interacting human partner. Nevertheless, increasingly many systems are moving towards more seamless forms of interaction, where factors such as implicit trust and persuasion between humans and robots are brought to the fore. In this context, the problem of attaining, through suitable computational models and algorithms, more complex strategic behaviours that can influence human decisions and actions during an interaction, remains largely open. To address this issue, this thesis introduces the problem of decision shaping in strategic interactions between humans and robots, where a robot seeks to lead, without however forcing, an interacting human partner to a particular state. Our approach to this problem is based on a combination of statistical modeling and synthesis of demonstrated behaviours, which enables robots to efficiently adapt to novel interacting agents. We primarily focus on interactions between autonomous and teleoperated (i.e. human-controlled) NAO humanoid robots, using the adversarial soccer penalty shooting game as an illustrative example. We begin by describing the various challenges that a robot operating in such complex interactive environments is likely to face. Then, we introduce a procedure through which composable strategy templates can be learned from provided human demonstrations of interactive behaviours. We subsequently present our primary contribution to the shaping problem, a Bayesian learning framework that empirically models and predicts the responses of an interacting agent, and computes action strategies that are likely to influence that agent towards a desired goal. We then address the related issue of factors affecting human decisions in these interactive strategic environments, such as the availability of perceptual information for the human operator. Finally, we describe an information processing algorithm, based on the Orient motion capture platform, which serves to facilitate direct (as opposed to teleoperation-mediated) strategic interactions between humans and robots. Our experiments introduce and evaluate a wide range of novel autonomous behaviours, where robots are shown to (learn to) influence a variety of interacting agents, ranging from other simple autonomous agents, to robots controlled by experienced human subjects. These results demonstrate the benefits of strategic reasoning in human-robot interaction, and constitute an important step towards realistic, practical applications, where robots are expected to be not just passive agents, but active, influencing participants

Edinburgh Research Archive

Approximating Value Equivalence in Interactive Dynamic Influence Diagrams Using Behavioral Coverage

Author: Conroy Ross
Tang Jing
Zeng Yifeng
Publication venue
Publication date: 01/07/2016
Field of study

Interactive dynamic influence diagrams (I-DIDs) provide an explicit way of modeling how a subject agent solves decision making problems in the presence of other agents in a common setting. To optimize its decisions, the subject agent needs to predict the other agents' behavior, that is generally obtained by solving their candidate models. This becomes extremely difficult since the model space may be rather large, and grows when the other agents act and observe over the time. A recent proposal for solving I-DIDs lies in a concept of value equivalence (VE) that shows potential advances on significantly reducing the model space. In this paper, we establish a principled framework to implement the VE techniques and propose an approximate method to compute VE of candidate models. The development offers ample opportunity of exploiting VE to further improve the scalability of I-DID solutions. We theoretically analyze properties of the approximate techniques and show empirical results in multiple problem domains

Northumbria Research Link

Teeside University's Research Repository

An extended study on addressing defender teamwork while accounting for uncertainty in attacker defender games using iterative Dec-MDPs

Author: JIANG Albert Xin
Pradeep VARAKANTHAM
SHIEH Eric
TAMBE Milind
YADAV Amulya
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Multi-agent teamwork and defender-attacker security games are two areas that are currently receiving significant attention within multi-agent systems research. Unfortunately, despite the need for effective teamwork among multiple defenders, little has been done to harness the teamwork 1 research in security games. The problem that this paper seeks to solve is the coordination of decentralized defender agents in the presence of uncer-tainty while securing targets against an observing adversary. To address this problem, we offer the following novel contributions in this paper: (i) New model of security games with defender teams that coordinate under uncertainty; (ii) New algorithm based on column generation that uti-lizes Decentralized Markov Decision Processes (Dec-MDPs) to generate defender strategies that incorporate uncertainty; (iii) New techniques to handle global events (when one or more agents may leave the system) during defender execution; (iv) Heuristics that help scale up in the num-ber of targets and agents to handle real-world scenarios; (v) Exploration of the robustness of randomized pure strategies. The paper opens the door to a potentially new area combining computational game theory and multi-agent teamwork.

CiteSeerX

Institutional Knowledge at Singapore Management University

Decision-Theoretic Planning Under Anonymity in Agent Populations

Author: Chen Yingke
Doshi Prashant
Sonu Ekhlas
Publication venue: 'AI Access Foundation'
Publication date: 24/08/2017
Field of study

Crossref

Teeside University's Research Repository

Recommended from our members

A value equivalence approach for solving interactive dynamic influence diagrams

Author: Conroy Ross
Zeng Yifeng
Cavazza Marc
Tang Jing
Pan Yinghui
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2003
Field of study

Interactive dynamic influence diagrams (I-DIDs) are recognized graphical models for sequential multiagent decision making under uncertainty. They represent the problem of how a subject agent acts in a common setting shared with other agents who may act in sophisticated ways. The difficulty in solving I-DIDs is mainly due to an exponentially growing space of candidate models ascribed to other agents over time. in order to minimize the model space, the previous I-DID techniques prune behaviorally equivalent models. In this paper, we challenge the minimal set of models and propose a value equivalence approach to further compress the model space. The new method reduces the space by additionally pruning behaviourally distinct models that result in the same expected value of the subject agent’s optimal policy. To achieve this, we propose to learn the value from available data particularly in practical applications of real-time strategy games. We demonstrate the performance of the new technique in two problem domains

Greenwich Academic Literature Archive

Repository of the University of Namur

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Author: Baarslag T. (Tim)
Hernandez-Leal P. (Pablo)
Kaisers M. (Michael)
Munoz de Cote E. (Enrique)
Publication venue
Publication date: 28/07/2018
Field of study

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research

CWI's Institutional Repository