28 research outputs found
Finite-state Strategies in Delay Games (full version)
What is a finite-state strategy in a delay game? We answer this surprisingly
non-trivial question by presenting a very general framework that allows to
remove delay: finite-state strategies exist for all winning conditions where
the resulting delay-free game admits a finite-state strategy. The framework is
applicable to games whose winning condition is recognized by an automaton with
an acceptance condition that satisfies a certain aggregation property. Our
framework also yields upper bounds on the complexity of determining the winner
of such delay games and upper bounds on the necessary lookahead to win the
game. In particular, we cover all previous results of that kind as special
cases of our uniform approach
Finite-state Strategies in Delay Games
What is a finite-state strategy in a delay game? We answer this surprisingly
non-trivial question and present a very general framework for computing such
strategies: they exist for all winning conditions that are recognized by
automata with acceptance conditions that satisfy a certain aggregation
property. Our framework also yields upper bounds on the complexity of
determining the winner of such delay games and upper bounds on the necessary
lookahead to win the game. In particular, we cover all previous results of that
kind as special cases of our uniform approach.Comment: In Proceedings GandALF 2017, arXiv:1709.01761. Full version at
arXiv:1704.0888
Finite-state Strategies in Delay Games
What is a finite-state strategy in a delay game? We answer this surprisingly non-trivial question by presenting a very general framework that allows to remove delay: finite-state strategies exist for all winning conditions where the resulting delay-free game admits a finite-state strategy. The framework is applicable to games whose winning condition is recognized by an automaton with an acceptance condition that satisfies a certain aggregation property. Our framework also yields upper bounds on the complexity of determining the winner of such delay games and upper bounds on the necessary lookahead to win the game. In particular, we cover all previous results of that kind as special cases of our uniform approach.SCOPUS: ar.jinfo:eu-repo/semantics/publishe
Temoral Difference Learning in Complex Domains
Submitted to the University of London for the Degree of Doctor of Philosophy in Computer Scienc
An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms
This paper presents a new exponential lower bound for the two most popular
deterministic variants of the strategy improvement algorithms for solving
parity, mean payoff, discounted payoff and simple stochastic games. The first
variant improves every node in each step maximizing the current valuation
locally, whereas the second variant computes the globally optimal improvement
in each step. We outline families of games on which both variants require
exponentially many strategy iterations
Opponent Modelling in Multi-Agent Systems
Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis