8 research outputs found

    Solving Games with Functional Regret Estimation

    Full text link
    We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.Comment: AAAI Conference on Artificial Intelligence 201

    Risk-aware navigation for UAV digital data collection

    Get PDF
    This thesis studies the navigation task for autonomous UAVs to collect digital data in a risky environment. Three problem formulations are proposed according to different real-world situations. First, we focus on uniform probabilistic risk and assume UAV has unlimited amount of energy. With these assumptions, we provide the graph-based Data-collecting Robot Problem (DRP) model, and propose heuristic planning solutions that consist of a clustering step and a tour building step. Experiments show our methods provide high-quality solutions with high expected reward. Second, we investigate non-uniform probabilistic risk and limited energy capacity of UAV. We present the Data-collection Problem (DCP) to model the task. DCP is a grid-based Markov decision process, and we utilize reinforcement learning with a deep Ensemble Navigation Network (ENN) to tackle the problem. Given four simple navigation algorithms and some additional heuristic information, ENN is able to find improved solutions. Finally, we consider the risk in the form of an opponent and limited energy capacity of UAV, for which we resort to the Data-collection Game (DCG) model. DCG is a grid-based two-player stochastic game where the opponent may have different strategies. We propose opponent modeling to improve data-collection efficiency, design four deep neural networks that model the opponent\u27s behavior at different levels, and empirically prove that explicit opponent modeling with a dedicated network provides superior performance

    Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems

    Get PDF
    Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.Comment: Final manuscript (46 pages), published in Artificial Intelligence Journal. The arXiv version also contains a table of contents after the abstract, but is otherwise identical to the AIJ version. Keywords: autonomous agents, multiagent systems, modelling other agents, opponent modellin

    Cooperation in Games

    Get PDF
    University of Minnesota Ph.D. dissertation. 2019. Major: Computer Science. Advisor: Maria Gini. 1 computer file (PDF); 159 pages.This dissertation explores several problems related to social behavior, which is a complex and difficult problem. In this dissertation we describe ways to solve problems for agents interacting with opponents, specifically (1) identifying cooperative strategies,(2) acting on fallible predictions, and (3) determining how much to compromise with the opponent. In a multi-agent environment an agent’s interactions with its opponent can significantly affect its performance. However, it is not always possible for the agent to fully model the behavior of the opponent and compute a best response. We present three algorithms for agents to use when interacting with an opponent too complex to be modelled. An agent which wishes to cooperate with its opponent must first identify what strategy constitutes a cooperative action. We address the problem of identifying cooperative strategies in repeated randomly generated games by modelling an agent’s intentions with a real number, its attitude, which is used to produce a modified game; the Nash equilibria of the modified game implement the strategies described by the intentions used to generate the modified game. We demonstrate how these values can be learned, and show how they can be used to achieve cooperation through reciprocation in repeated randomly generated normal form games. Next, an agent which has formed a prediction of opponent behavior which maybe incorrect needs to be able to take advantage of that prediction without adopting a strategy which is overly vulnerable to exploitation. We have developed Restricted Stackelberg Response with Safety (RSRS), an algorithm which can produce a strategy to respond to a prediction while balancing the priorities of performance against the prediction, worst-case performance, and performance against a best-responding opponent. By balancing those concerns appropriately the agent can perform well against an opponent which it cannot reliably predict. Finally we look at how an agent can manipulate an opponent to choose actions which benefit the agent. This problem is often complicated by the difficulty of analyzing the game the agent is playing. To address this issue, we begin by developing a new game, the Gift Exchange game, which is trivial to analyze; the only question is how the opponent will react. We develop a variety of strategies the agent can use when playing the game, and explore how the best strategy is affected by the agent’s discount factor and prior over opponents
    corecore