2,562 research outputs found

    Opponent Modelling in Multi-Agent Systems

    Get PDF
    Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Negotiating with a logical-linguistic protocol in a dialogical framework

    Get PDF
    This book is the result of years of reflection. Some time ago, while working in commodities, the author felt how difficult it was to decide the order in which to use arguments during a negotiation process. What would happen if we translated the arguments into cards and played them according to the rules of the Bridge game? The results were impressive. There was potential for improvement in the negotiation process. The investigation went deeper, exploring players, cards, deals and the information concealed in the players´ announcements, in the cards and in the deals. This new angle brought the research to NeuroLinguistic Patterns and cryptic languages, such as Russian Cards. In the following pages, the author shares her discovery of a new application for Logical Dialogues: Negotiations, tackled from basic linguistic structures placed under a dialogue form as a cognitive system which ‘understands’ natural language, with the aim to solve conflicts and even to serve peace

    Learning to Represent Haptic Feedback for Partially-Observable Tasks

    Full text link
    The sense of touch, being the earliest sensory system to develop in a human body [1], plays a critical part of our daily interaction with the environment. In order to successfully complete a task, many manipulation interactions require incorporating haptic feedback. However, manually designing a feedback mechanism can be extremely challenging. In this work, we consider manipulation tasks that need to incorporate tactile sensor feedback in order to modify a provided nominal plan. To incorporate partial observation, we present a new framework that models the task as a partially observable Markov decision process (POMDP) and learns an appropriate representation of haptic feedback which can serve as the state for a POMDP model. The model, that is parametrized by deep recurrent neural networks, utilizes variational Bayes methods to optimize the approximate posterior. Finally, we build on deep Q-learning to be able to select the optimal action in each state without access to a simulator. We test our model on a PR2 robot for multiple tasks of turning a knob until it clicks.Comment: IEEE International Conference on Robotics and Automation (ICRA), 201

    Solving Common-Payoff Games with Approximate Policy Iteration

    Full text link
    For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. Code is available at https://github.com/ssokota/capi .Comment: AAAI 202

    Logical dynamics meets logical pluralism?

    Get PDF
    Where is logic heading today? There is a general feeling that the discipline is broadening its scope and agenda beyond classical foundational issues, and maybe even a concern that, like Stephen Leacock’s famous horseman, it is ‘riding off madly in all directions’. So, what is the resultant vector? There seem to be two broad answers in circulation today. One is logical pluralism, locating the new scope of logic in charting a wide variety of reasoning styles, often marked by non-classical structural rules of inference. This is the new program that I subscribed to in my work on sub-structural logics around 1990, and it is a powerful movement today. But gradually, I have changed my mind about the crux of what logic should become. I would now say that the main issue is not variety of reasoning styles and notions of consequence, but the variety of informational tasks performed by intelligent interacting agents, of which inference is only one among many, involving observation, memory, questions and answers, dialogue, or general communication. And logical systems should deal with a wide variety of these, making information-carrying events first-class citizens in their set-up. The purpose of this brief paper is to contrast and compare the two approaches, drawing freely on some insights from earlier published papers. In particular, I will argue that logical dynamics sets itself the more ambitious diagnostic goal of explaining why substructural phenomena occur, by ‘deconstructing’ them into classical logic plus an explicit account of the relevant informational events
    • …
    corecore