3,131 research outputs found

    Connectivity in the presence of an opponent

    Full text link
    The paper introduces two player connectivity games played on finite bipartite graphs. Algorithms that solve these connectivity games can be used as subroutines for solving M\"uller games. M\"uller games constitute a well established class of games in model checking and verification. In connectivity games, the objective of one of the players is to visit every node of the game graph infinitely often. The first contribution of this paper is our proof that solving connectivity games can be reduced to the incremental strongly connected component maintenance (ISCCM) problem, an important problem in graph algorithms and data structures. The second contribution is that we non-trivially adapt two known algorithms for the ISCCM problem to provide two efficient algorithms that solve the connectivity games problem. Finally, based on the techniques developed, we recast Horn's polynomial time algorithm that solves explicitly given M\"uller games and provide an alternative proof of its correctness. Our algorithms are more efficient than that of Horn's algorithm. Our solution for connectivity games is used as a subroutine in the algorithm

    Alternative Automata-based Approaches to Probabilistic Model Checking

    Get PDF
    In this thesis we focus on new methods for probabilistic model checking (PMC) with linear temporal logic (LTL). The standard approach translates an LTL formula into a deterministic ω-automaton with a double-exponential blow up. There are approaches for Markov chain analysis against LTL with exponential runtime, which motivates the search for non-deterministic automata with restricted forms of non-determinism that make them suitable for PMC. For MDPs, the approach via deterministic automata matches the double-exponential lower bound, but a practical application might benefit from approaches via non-deterministic automata. We first investigate good-for-games (GFG) automata. In GFG automata one can resolve the non-determinism for a finite prefix without knowing the infinite suffix and still obtain an accepting run for an accepted word. We explain that GFG automata are well-suited for MDP analysis on a theoretic level, but our experiments show that GFG automata cannot compete with deterministic automata. We have also researched another form of pseudo-determinism, namely unambiguity, where for every accepted word there is exactly one accepting run. We present a polynomial-time approach for PMC of Markov chains against specifications given by an unambiguous Büchi automaton (UBA). Its two key elements are the identification whether the induced probability is positive, and if so, the identification of a state set inducing probability 1. Additionally, we examine the new symbolic Muller acceptance described in the Hanoi Omega Automata Format, which we call Emerson-Lei acceptance. It is a positive Boolean formula over unconditional fairness constraints. We present a construction of small deterministic automata using Emerson-Lei acceptance. Deciding, whether an MDP has a positive maximal probability to satisfy an Emerson-Lei acceptance, is NP-complete. This fact has triggered a DPLL-based algorithm for deciding positiveness

    TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

    Get PDF
    Combining deep model-free reinforcement learning with on-line planning is a promising approach to building on the successes of deep RL. On-line planning with look-ahead trees has proven successful in environments where transition models are known a priori. However, in complex environments where transition models need to be learned from data, the deficiencies of learned models have limited their utility for planning. To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions. TreeQN dynamically constructs a tree by recursively applying a transition model in a learned abstract state space and then aggregating predicted rewards and state-values using a tree backup to estimate Q-values. We also propose ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network. Both approaches are trained end-to-end, such that the learned model is optimised for its actual use in the tree. We show that TreeQN and ATreeC outperform n-step DQN and A2C on a box-pushing task, as well as n-step DQN and value prediction networks (Oh et al. 2017) on multiple Atari games. Furthermore, we present ablation studies that demonstrate the effect of different auxiliary losses on learning transition models

    Acta Cybernetica : Volume 16. Number 3.

    Get PDF

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Coalitions, tipping points and the speed of evolution

    Get PDF
    This study considers pure coordination games on networks and the waiting time for an adaptive process of strategic change to achieve efficient coordination. Although it is in the interest of every player to coordinate on a single globally efficient norm, coalitional behavior at a local level can greatly slow, as well as hasten convergence to efficiency. For some networks, when one action becomes efficient enough relative to the other, the effect of coalitional behavior changes abruptly from a conservative effect to a reforming effect. These effects are confirmed for a variety of stylized and empirical social networks found in the literature. For coordination games in which the Pareto efficient and risk dominant equilibria differ, polymorphic states can be the only stochastically stable states
    • …
    corecore