634 research outputs found
Algorithms for Game Metrics
Simulation and bisimulation metrics for stochastic systems provide a
quantitative generalization of the classical simulation and bisimulation
relations. These metrics capture the similarity of states with respect to
quantitative specifications written in the quantitative {\mu}-calculus and
related probabilistic logics. We first show that the metrics provide a bound
for the difference in long-run average and discounted average behavior across
states, indicating that the metrics can be used both in system verification,
and in performance evaluation. For turn-based games and MDPs, we provide a
polynomial-time algorithm for the computation of the one-step metric distance
between states. The algorithm is based on linear programming; it improves on
the previous known exponential-time algorithm based on a reduction to the
theory of reals. We then present PSPACE algorithms for both the decision
problem and the problem of approximating the metric distance between two
states, matching the best known algorithms for Markov chains. For the
bisimulation kernel of the metric our algorithm works in time O(n^4) for both
turn-based games and MDPs; improving the previously best known O(n^9\cdot
log(n)) time algorithm for MDPs. For a concurrent game G, we show that
computing the exact distance between states is at least as hard as computing
the value of concurrent reachability games and the square-root-sum problem in
computational geometry. We show that checking whether the metric distance is
bounded by a rational r, can be done via a reduction to the theory of real
closed fields, involving a formula with three quantifier alternations, yielding
O(|G|^O(|G|^5)) time complexity, improving the previously known reduction,
which yielded O(|G|^O(|G|^7)) time complexity. These algorithms can be iterated
to approximate the metrics using binary search.Comment: 27 pages. Full version of the paper accepted at FSTTCS 200
On overfitting and asymptotic bias in batch reinforcement learning with partial observability
This paper provides an analysis of the tradeoff between asymptotic bias
(suboptimality with unlimited data) and overfitting (additional suboptimality
due to limited data) in the context of reinforcement learning with partial
observability. Our theoretical analysis formally characterizes that while
potentially increasing the asymptotic bias, a smaller state representation
decreases the risk of overfitting. This analysis relies on expressing the
quality of a state representation by bounding L1 error terms of the associated
belief states. Theoretical results are empirically illustrated when the state
representation is a truncated history of observations, both on synthetic POMDPs
and on a large-scale POMDP in the context of smartgrids, with real-world data.
Finally, similarly to known results in the fully observable setting, we also
briefly discuss and empirically illustrate how using function approximators and
adapting the discount factor may enhance the tradeoff between asymptotic bias
and overfitting in the partially observable context.Comment: Accepted at the Journal of Artificial Intelligence Research (JAIR) -
31 page
Game Refinement Relations and Metrics
We consider two-player games played over finite state spaces for an infinite
number of rounds. At each state, the players simultaneously choose moves; the
moves determine a successor state. It is often advantageous for players to
choose probability distributions over moves, rather than single moves. Given a
goal, for example, reach a target state, the question of winning is thus a
probabilistic one: what is the maximal probability of winning from a given
state?
On these game structures, two fundamental notions are those of equivalences
and metrics. Given a set of winning conditions, two states are equivalent if
the players can win the same games with the same probability from both states.
Metrics provide a bound on the difference in the probabilities of winning
across states, capturing a quantitative notion of state similarity.
We introduce equivalences and metrics for two-player game structures, and we
show that they characterize the difference in probability of winning games
whose goals are expressed in the quantitative mu-calculus. The quantitative
mu-calculus can express a large set of goals, including reachability, safety,
and omega-regular properties. Thus, we claim that our relations and metrics
provide the canonical extensions to games, of the classical notion of
bisimulation for transition systems. We develop our results both for
equivalences and metrics, which generalize bisimulation, and for asymmetrical
versions, which generalize simulation
Linear and Branching System Metrics
We extend the classical system relations of trace\ud
inclusion, trace equivalence, simulation, and bisimulation to a quantitative setting in which propositions are interpreted not as boolean values, but as elements of arbitrary metric spaces.\ud
\ud
Trace inclusion and equivalence give rise to asymmetrical and symmetrical linear distances, while simulation and bisimulation give rise to asymmetrical and symmetrical branching distances. We study the relationships among these distances, and we provide a full logical characterization of the distances in terms of quantitative versions of LTL and μ-calculus. We show that, while trace inclusion (resp. equivalence) coincides with simulation (resp. bisimulation) for deterministic boolean transition systems, linear\ud
and branching distances do not coincide for deterministic metric transition systems. Finally, we provide algorithms for computing the distances over finite systems, together with a matching lower complexity bound
- …