7,452 research outputs found
Learning from induced changes in opponent (re)actions in multi-agent games
Multi-agent learning is a growing area of research. An important topic is to formulate how an agent can learn a good policy in the face of adaptive, competitive opponents. Most research has focused on extensions of single agent learning techniques originally designed for agents in more static environments. These techniques however fail to incorporate a notion of the effect of own previous actions on the development of the policy of the other agents in the system. We argue that incorporation of this property is beneficial in competitive settings. In this paper, we present a novel algorithm to capture this notion, and present experimental results to validate our claim
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
An important challenge for safety in machine learning and artificial
intelligence systems is a~set of related failures involving specification
gaming, reward hacking, fragility to distributional shifts, and Goodhart's or
Campbell's law. This paper presents additional failure modes for interactions
within multi-agent systems that are closely related. These multi-agent failure
modes are more complex, more problematic, and less well understood than the
single-agent case, and are also already occurring, largely unnoticed. After
motivating the discussion with examples from poker-playing artificial
intelligence (AI), the paper explains why these failure modes are in some
senses unavoidable. Following this, the paper categorizes failure modes,
provides definitions, and cites examples for each of the modes: accidental
steering, coordination failures, adversarial misalignment, input spoofing and
filtering, and goal co-option or direct hacking. The paper then discusses how
extant literature on multi-agent AI fails to address these failure modes, and
identifies work which may be useful for the mitigation of these failure modes.Comment: 12 Pages, This version re-submitted to Big Data and Cognitive
Computing, Special Issue "Artificial Superintelligence: Coordination &
Strategy
Analysing the behaviour of robot teams through relational sequential pattern mining
This report outlines the use of a relational representation in a Multi-Agent
domain to model the behaviour of the whole system. A desired property in this
systems is the ability of the team members to work together to achieve a common
goal in a cooperative manner. The aim is to define a systematic method to
verify the effective collaboration among the members of a team and comparing
the different multi-agent behaviours. Using external observations of a
Multi-Agent System to analyse, model, recognize agent behaviour could be very
useful to direct team actions. In particular, this report focuses on the
challenge of autonomous unsupervised sequential learning of the team's
behaviour from observations. Our approach allows to learn a symbolic sequence
(a relational representation) to translate raw multi-agent, multi-variate
observations of a dynamic, complex environment, into a set of sequential
behaviours that are characteristic of the team in question, represented by a
set of sequences expressed in first-order logic atoms. We propose to use a
relational learning algorithm to mine meaningful frequent patterns among the
relational sequences to characterise team behaviours. We compared the
performance of two teams in the RoboCup four-legged league environment, that
have a very different approach to the game. One uses a Case Based Reasoning
approach, the other uses a pure reactive behaviour.Comment: 25 page
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Monte Carlo tree search (MCTS) is extremely popular in computer Go which
determines each action by enormous simulations in a broad and deep search tree.
However, human experts select most actions by pattern analysis and careful
evaluation rather than brute search of millions of future nteractions. In this
paper, we propose a computer Go system that follows experts way of thinking and
playing. Our system consists of two parts. The first part is a novel deep
alternative neural network (DANN) used to generate candidates of next move.
Compared with existing deep convolutional neural network (DCNN), DANN inserts
recurrent layer after each convolutional layer and stacks them in an
alternative manner. We show such setting can preserve more contexts of local
features and its evolutions which are beneficial for move prediction. The
second part is a long-term evaluation (LTE) module used to provide a reliable
evaluation of candidates rather than a single probability from move predictor.
This is consistent with human experts nature of playing since they can foresee
tens of steps to give an accurate estimation of candidates. In our system, for
each candidate, LTE calculates a cumulative reward after several future
interactions when local variations are settled. Combining criteria from the two
parts, our system determines the optimal choice of next move. For more
comprehensive experiments, we introduce a new professional Go dataset (PGD),
consisting of 253233 professional records. Experiments on GoGoD and PGD
datasets show the DANN can substantially improve performance of move prediction
over pure DCNN. When combining LTE, our system outperforms most relevant
approaches and open engines based on MCTS.Comment: AAAI 201
Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems
A key challenge in multi-robot and multi-agent systems is generating
solutions that are robust to other self-interested or even adversarial parties
who actively try to prevent the agents from achieving their goals. The
practicality of existing works addressing this challenge is limited to only
small-scale synchronous decision-making scenarios or a single agent planning
its best response against a single adversary with fixed, procedurally
characterized strategies. In contrast this paper considers a more realistic
class of problems where a team of asynchronous agents with limited observation
and communication capabilities need to compete against multiple strategic
adversaries with changing strategies. This problem necessitates agents that can
coordinate to detect changes in adversary strategies and plan the best response
accordingly. Our approach first optimizes a set of stratagems that represent
these best responses. These optimized stratagems are then integrated into a
unified policy that can detect and respond when the adversaries change their
strategies. The near-optimality of the proposed framework is established
theoretically as well as demonstrated empirically in simulation and hardware
- …