8 research outputs found
Solving Games with Functional Regret Estimation
We propose a novel online learning method for minimizing regret in large
extensive-form games. The approach learns a function approximator online to
estimate the regret for choosing a particular action. A no-regret algorithm
uses these estimates in place of the true regrets to define a sequence of
policies.
We prove the approach sound by providing a bound relating the quality of the
function approximation and regret of the algorithm. A corollary being that the
method is guaranteed to converge to a Nash equilibrium in self-play so long as
the regrets are ultimately realizable by the function approximator. Our
technique can be understood as a principled generalization of existing work on
abstraction in large games; in our work, both the abstraction as well as the
equilibrium are learned during self-play. We demonstrate empirically the method
achieves higher quality strategies than state-of-the-art abstraction techniques
given the same resources.Comment: AAAI Conference on Artificial Intelligence 201
Risk-aware navigation for UAV digital data collection
This thesis studies the navigation task for autonomous UAVs to collect digital data in a risky environment. Three problem formulations are proposed according to different real-world situations. First, we focus on uniform probabilistic risk and assume UAV has unlimited amount of energy. With these assumptions, we provide the graph-based Data-collecting Robot Problem (DRP) model, and propose heuristic planning solutions that consist of a clustering step and a tour building step. Experiments show our methods provide high-quality solutions with high expected reward. Second, we investigate non-uniform probabilistic risk and limited energy capacity of UAV. We present the Data-collection Problem (DCP) to model the task. DCP is a grid-based Markov decision process, and we utilize reinforcement learning with a deep Ensemble Navigation Network (ENN) to tackle the problem. Given four simple navigation algorithms and some additional heuristic information, ENN is able to find improved solutions. Finally, we consider the risk in the form of an opponent and limited energy capacity of UAV, for which we resort to the Data-collection Game (DCG) model. DCG is a grid-based two-player stochastic game where the opponent may have different strategies. We propose opponent modeling to improve data-collection efficiency, design four deep neural networks that model the opponent\u27s behavior at different levels, and empirically prove that explicit opponent modeling with a dedicated network provides superior performance
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Much research in artificial intelligence is concerned with the development of
autonomous agents that can interact effectively with other agents. An important
aspect of such agents is the ability to reason about the behaviours of other
agents, by constructing models which make predictions about various properties
of interest (such as actions, goals, beliefs) of the modelled agents. A variety
of modelling approaches now exist which vary widely in their methodology and
underlying assumptions, catering to the needs of the different sub-communities
within which they were developed and reflecting the different practical uses
for which they are intended. The purpose of the present article is to provide a
comprehensive survey of the salient modelling methods which can be found in the
literature. The article concludes with a discussion of open problems which may
form the basis for fruitful future research.Comment: Final manuscript (46 pages), published in Artificial Intelligence
Journal. The arXiv version also contains a table of contents after the
abstract, but is otherwise identical to the AIJ version. Keywords: autonomous
agents, multiagent systems, modelling other agents, opponent modellin
Cooperation in Games
University of Minnesota Ph.D. dissertation. 2019. Major: Computer Science. Advisor: Maria Gini. 1 computer file (PDF); 159 pages.This dissertation explores several problems related to social behavior, which is a complex and difficult problem. In this dissertation we describe ways to solve problems for agents interacting with opponents, specifically (1) identifying cooperative strategies,(2) acting on fallible predictions, and (3) determining how much to compromise with the opponent. In a multi-agent environment an agent’s interactions with its opponent can significantly affect its performance. However, it is not always possible for the agent to fully model the behavior of the opponent and compute a best response. We present three algorithms for agents to use when interacting with an opponent too complex to be modelled. An agent which wishes to cooperate with its opponent must first identify what strategy constitutes a cooperative action. We address the problem of identifying cooperative strategies in repeated randomly generated games by modelling an agent’s intentions with a real number, its attitude, which is used to produce a modified game; the Nash equilibria of the modified game implement the strategies described by the intentions used to generate the modified game. We demonstrate how these values can be learned, and show how they can be used to achieve cooperation through reciprocation in repeated randomly generated normal form games. Next, an agent which has formed a prediction of opponent behavior which maybe incorrect needs to be able to take advantage of that prediction without adopting a strategy which is overly vulnerable to exploitation. We have developed Restricted Stackelberg Response with Safety (RSRS), an algorithm which can produce a strategy to respond to a prediction while balancing the priorities of performance against the prediction, worst-case performance, and performance against a best-responding opponent. By balancing those concerns appropriately the agent can perform well against an opponent which it cannot reliably predict. Finally we look at how an agent can manipulate an opponent to choose actions which benefit the agent. This problem is often complicated by the difficulty of analyzing the game the agent is playing. To address this issue, we begin by developing a new game, the Gift Exchange game, which is trivial to analyze; the only question is how the opponent will react. We develop a variety of strategies the agent can use when playing the game, and explore how the best strategy is affected by the agent’s discount factor and prior over opponents