6 research outputs found

    Minimizing Regret in Discounted-Sum Games

    Get PDF
    In this paper, we study the problem of minimizing regret in discounted-sum games played on weighted game graphs. We give algorithms for the general problem of computing the minimal regret of the controller (Eve) as well as several variants depending on which strategies the environment (Adam) is permitted to use. We also consider the problem of synthesizing regret-free strategies for Eve in each of these scenarios

    The Impatient May Use Limited Optimism to Minimize Regret

    Full text link
    Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may regret her actions, realizing that a previous choice was suboptimal given the behavior of the environment. The main contribution of this paper is a PSPACE algorithm for computing the minimum possible regret of a given game. To this end, several results of independent interest are shown. (1) We identify a class of regret-minimizing and admissible strategies that first assume that the environment is collaborating, then assume it is adversarial---the precise timing of the switch is key here. (2) Disregarding the computational cost of numerical analysis, we provide an NP algorithm that checks that the regret entailed by a given time-switching strategy exceeds a given value. (3) We show that determining whether a strategy minimizes regret is decidable in PSPACE

    Synthesis from Weighted Specifications with Partial Domains over Finite Words

    Get PDF
    info:eu-repo/semantics/publishe

    History Determinism vs. Good for Gameness in Quantitative Automata

    Get PDF
    Automata models between determinism and nondeterminism/alternations can retain some of the algorithmic properties of deterministic automata while enjoying some of the expressiveness and succinctness of nondeterminism. We study three closely related such models - history determinism, good for gameness and determinisability by pruning - on quantitative automata. While in the Boolean setting, history determinism and good for gameness coincide, we show that this is no longer the case in the quantitative setting: good for gameness is broader than history determinism, and coincides with a relaxed version of it, defined with respect to thresholds. We further identify criteria in which history determinism, which is generally broader than determinisability by pruning, coincides with it, which we then apply to typical quantitative automata types. As a key application of good for games and history deterministic automata is synthesis, we clarify the relationship between the two notions and various quantitative synthesis problems. We show that good-for-games automata are central for "global" (classical) synthesis, while "local" (good-enough) synthesis reduces to deciding whether a nondeterministic automaton is history deterministic
    corecore