6 research outputs found
Minimizing Regret in Discounted-Sum Games
In this paper, we study the problem of minimizing regret in discounted-sum games played on weighted game graphs. We give algorithms for the general problem of computing the minimal regret of the controller (Eve) as well as several variants depending on which strategies the environment (Adam) is permitted to use. We also consider the problem of synthesizing regret-free strategies for Eve in each of these scenarios
The Impatient May Use Limited Optimism to Minimize Regret
Discounted-sum games provide a formal model for the study of reinforcement
learning, where the agent is enticed to get rewards early since later rewards
are discounted. When the agent interacts with the environment, she may regret
her actions, realizing that a previous choice was suboptimal given the behavior
of the environment. The main contribution of this paper is a PSPACE algorithm
for computing the minimum possible regret of a given game. To this end, several
results of independent interest are shown. (1) We identify a class of
regret-minimizing and admissible strategies that first assume that the
environment is collaborating, then assume it is adversarial---the precise
timing of the switch is key here. (2) Disregarding the computational cost of
numerical analysis, we provide an NP algorithm that checks that the regret
entailed by a given time-switching strategy exceeds a given value. (3) We show
that determining whether a strategy minimizes regret is decidable in PSPACE
Synthesis from Weighted Specifications with Partial Domains over Finite Words
info:eu-repo/semantics/publishe
History Determinism vs. Good for Gameness in Quantitative Automata
Automata models between determinism and nondeterminism/alternations can retain some of the algorithmic properties of deterministic automata while enjoying some of the expressiveness and succinctness of nondeterminism. We study three closely related such models - history determinism, good for gameness and determinisability by pruning - on quantitative automata.
While in the Boolean setting, history determinism and good for gameness coincide, we show that this is no longer the case in the quantitative setting: good for gameness is broader than history determinism, and coincides with a relaxed version of it, defined with respect to thresholds. We further identify criteria in which history determinism, which is generally broader than determinisability by pruning, coincides with it, which we then apply to typical quantitative automata types.
As a key application of good for games and history deterministic automata is synthesis, we clarify the relationship between the two notions and various quantitative synthesis problems. We show that good-for-games automata are central for "global" (classical) synthesis, while "local" (good-enough) synthesis reduces to deciding whether a nondeterministic automaton is history deterministic
Recommended from our members
Human-aware Strategy Synthesis for Robotic Manipulators Using Regret Games
From Autonomous cars, factories to households, we envision a future where robots work safely and efficiently alongside humans. While robots today thrive in industrial and lab settings, we must develop theories and algorithms that will enable the transition of these systems from robot-centric workspaces to the real-world. Hence, we want robotic systems not only to observe and react to the changes in the environment but also guarantee completion of the given task while ensuring safe execution. For robots operating in the presence of humans with non-contradicting goals, it is crucial to work together to save resources, time, and ensuring that everyone achieves their goals. Combining these objectives is challenging given the complexity of the tasks and various ways the system could interact with the environment. This thesis blends different theories developed by the Formal Methods, Game Theory, and the Human-Robot Interaction community to develop a Regret Minimizing Framework (RMF) that addresses the above problems. While previous work using these approaches has assumed the human to be either purely adversarial or probabilistic, these assumptions are very conservative and eliminate the possibility of cooperation. This thesis proposes a notion of regret to synthesize high-level strategies for the robot that explores possible cooperation with the human while ensuring completion of the given task within some user-defined bound on the total energy spent by the robot. This work analyzes and implements various notions of regret and reasons about their emergent behaviors. An end-to-end synthesis toolbox is developed to compute regret minimizing strategies, which implements various optimization techniques to help scale the algorithms to the robotics domain and illustrate the efficacy of the optimal strategies through various case studies.</p
Minimizing Regret in Discounted-Sum Games
info:eu-repo/semantics/publishe