4 research outputs found
Computing Probabilistic Bisimilarity Distances via Policy Iteration
A transformation mapping a labelled Markov chain to a simple stochastic game is presented. In the resulting simple stochastic game, each vertex corresponds to a pair of states of the labelled Markov chain. The value of a vertex of the simple stochastic game is shown to be equal to the probabilistic bisimilarity distance, a notion due to Desharnais, Gupta, Jagadeesan and Panangaden, of the corresponding pair of states of the labelled Markov chain. Bacci, Bacci, Larsen and Mardare introduced an algorithm to compute the probabilistic bisimilarity distances for a labelled Markov chain. A modification of a basic version of their algorithm for a labelled Markov chain is shown to be the policy iteration algorithm applied to the corresponding simple stochastic game. Furthermore, it is shown that this algorithm takes exponential time in the worst case
Learning to Control in Metric Space with Optimal Regret
We study online reinforcement learning for finite-horizon deterministic
control systems with {\it arbitrary} state and action spaces. Suppose that the
transition dynamics and reward function is unknown, but the state and action
space is endowed with a metric that characterizes the proximity between
different states and actions. We provide a surprisingly simple upper-confidence
reinforcement learning algorithm that uses a function approximation oracle to
estimate optimistic Q functions from experiences. We show that the regret of
the algorithm after episodes is where is a
smoothness parameter, and is the doubling dimension of the state-action
space with respect to the given metric. We also establish a near-matching
regret lower bound. The proposed method can be adapted to work for more
structured transition systems, including the finite-state case and the case
where value functions are linear combinations of features, where the method
also achieve the optimal regret
Distribution-based bisimulation for labelled Markov processes
In this paper we propose a (sub)distribution-based bisimulation for labelled
Markov processes and compare it with earlier definitions of state and event
bisimulation, which both only compare states. In contrast to those state-based
bisimulations, our distribution bisimulation is weaker, but corresponds more
closely to linear properties. We construct a logic and a metric to describe our
distribution bisimulation and discuss linearity, continuity and compositional
properties.Comment: Accepted by FORMATS 201
Computing Probabilistic Bisimilarity Distances for Probabilistic Automata
The probabilistic bisimilarity distance of Deng et al. has been proposed as a
robust quantitative generalization of Segala and Lynch's probabilistic
bisimilarity for probabilistic automata. In this paper, we present a
characterization of the bisimilarity distance as the solution of a simple
stochastic game. The characterization gives us an algorithm to compute the
distances by applying Condon's simple policy iteration on these games. The
correctness of Condon's approach, however, relies on the assumption that the
games are stopping. Our games may be non-stopping in general, yet we are able
to prove termination for this extended class of games. Already other algorithms
have been proposed in the literature to compute these distances, with
complexity in and \textbf{PPAD}. Despite the
theoretical relevance, these algorithms are inefficient in practice. To the
best of our knowledge, our algorithm is the first practical solution.
The characterization of the probabilistic bisimilarity distance mentioned
above crucially uses a dual presentation of the Hausdorff distance due to
M\'emoli. As an additional contribution, in this paper we show that M\'emoli's
result can be used also to prove that the bisimilarity distance bounds the
difference in the maximal (or minimal) probability of two states to satisfying
arbitrary -regular properties, expressed, eg., as LTL formulas