3,131 research outputs found
Connectivity in the presence of an opponent
The paper introduces two player connectivity games played on finite bipartite
graphs. Algorithms that solve these connectivity games can be used as
subroutines for solving M\"uller games. M\"uller games constitute a well
established class of games in model checking and verification. In connectivity
games, the objective of one of the players is to visit every node of the game
graph infinitely often. The first contribution of this paper is our proof that
solving connectivity games can be reduced to the incremental strongly connected
component maintenance (ISCCM) problem, an important problem in graph algorithms
and data structures. The second contribution is that we non-trivially adapt two
known algorithms for the ISCCM problem to provide two efficient algorithms that
solve the connectivity games problem. Finally, based on the techniques
developed, we recast Horn's polynomial time algorithm that solves explicitly
given M\"uller games and provide an alternative proof of its correctness. Our
algorithms are more efficient than that of Horn's algorithm. Our solution for
connectivity games is used as a subroutine in the algorithm
Alternative Automata-based Approaches to Probabilistic Model Checking
In this thesis we focus on new methods for probabilistic model checking (PMC) with linear temporal logic (LTL). The standard approach translates an LTL formula into a deterministic ω-automaton with a double-exponential blow up.
There are approaches for Markov chain analysis against LTL with exponential runtime, which motivates the search for non-deterministic automata with restricted forms of non-determinism that make them suitable for PMC. For MDPs, the approach via deterministic automata matches the double-exponential lower bound, but a practical application might benefit from approaches via non-deterministic automata.
We first investigate good-for-games (GFG) automata. In GFG automata one can resolve the non-determinism for a finite prefix without knowing the infinite suffix and still obtain an accepting run for an accepted word. We explain that GFG automata are well-suited for MDP analysis on a theoretic level, but our experiments show that GFG automata cannot compete with deterministic automata.
We have also researched another form of pseudo-determinism, namely unambiguity, where for every accepted word there is exactly one accepting run. We present a polynomial-time approach for PMC of Markov chains against specifications given by an unambiguous BĂĽchi automaton (UBA). Its two key elements are the identification whether the induced probability is positive, and if so, the identification of a state set inducing probability 1.
Additionally, we examine the new symbolic Muller acceptance described in the Hanoi Omega Automata Format, which we call Emerson-Lei acceptance. It is a positive Boolean formula over unconditional fairness constraints. We present a construction of small deterministic automata using Emerson-Lei acceptance. Deciding, whether an MDP has a positive maximal probability to satisfy an Emerson-Lei acceptance, is NP-complete. This fact has triggered a DPLL-based algorithm for deciding positiveness
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
Combining deep model-free reinforcement learning with on-line planning is a
promising approach to building on the successes of deep RL. On-line planning
with look-ahead trees has proven successful in environments where transition
models are known a priori. However, in complex environments where transition
models need to be learned from data, the deficiencies of learned models have
limited their utility for planning. To address these challenges, we propose
TreeQN, a differentiable, recursive, tree-structured model that serves as a
drop-in replacement for any value function network in deep RL with discrete
actions. TreeQN dynamically constructs a tree by recursively applying a
transition model in a learned abstract state space and then aggregating
predicted rewards and state-values using a tree backup to estimate Q-values. We
also propose ATreeC, an actor-critic variant that augments TreeQN with a
softmax layer to form a stochastic policy network. Both approaches are trained
end-to-end, such that the learned model is optimised for its actual use in the
tree. We show that TreeQN and ATreeC outperform n-step DQN and A2C on a
box-pushing task, as well as n-step DQN and value prediction networks (Oh et
al. 2017) on multiple Atari games. Furthermore, we present ablation studies
that demonstrate the effect of different auxiliary losses on learning
transition models
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Coalitions, tipping points and the speed of evolution
This study considers pure coordination games on networks and the waiting time for an adaptive process of strategic change to achieve efficient coordination. Although it is in the interest of every player to coordinate on a single globally efficient norm, coalitional behavior at a local level can greatly slow, as well as hasten convergence to efficiency. For some networks, when one action becomes efficient enough relative to the other, the effect of coalitional behavior changes abruptly from a conservative effect to a reforming effect. These effects are confirmed for a variety of stylized and empirical social networks found in the literature. For coordination games in which the Pareto efficient and risk dominant equilibria differ, polymorphic states can be the only stochastically stable states
- …