Search CORE

4 research outputs found

Computing Probabilistic Bisimilarity Distances via Policy Iteration

Author: Tang Qiyi
van Breugel Franck
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Conference on Concurrency Theory (CONCUR 2016)
Publication date: 01/01/2016
Field of study

A transformation mapping a labelled Markov chain to a simple stochastic game is presented. In the resulting simple stochastic game, each vertex corresponds to a pair of states of the labelled Markov chain. The value of a vertex of the simple stochastic game is shown to be equal to the probabilistic bisimilarity distance, a notion due to Desharnais, Gupta, Jagadeesan and Panangaden, of the corresponding pair of states of the labelled Markov chain. Bacci, Bacci, Larsen and Mardare introduced an algorithm to compute the probabilistic bisimilarity distances for a labelled Markov chain. A modification of a basic version of their algorithm for a labelled Markov chain is shown to be the policy iteration algorithm applied to the corresponding simple stochastic game. Furthermore, it is shown that this algorithm takes exponential time in the worst case

Dagstuhl Research Online Publication Server

Learning to Control in Metric Space with Optimal Regret

Author: Ni Chengzhuo
Wang Mengdi
Yang Lin F.
Publication venue
Publication date: 04/05/2019
Field of study

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after

K

episodes is

O(HL(KH)^{\frac{d-1}{d}})

where

L

is a smoothness parameter, and

d

is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Distribution-based bisimulation for labelled Markov processes

Author: A Abate
A Abate
D Gebler
EP Vink
F Breugel van
G Bacci
H Hermanns
J Desharnais
J Desharnais
J Desharnais
JG Kemeny
KG Larsen
L Doyen
M Hennessy
N Ferns
N Urabe
P Chaput
P Panangaden
PR D’Argenio
R Durrett
R Segala
V Danos
V Danos
V Danos
Y Deng
Y Feng
Publication venue
Publication date: 30/06/2017
Field of study

In this paper we propose a (sub)distribution-based bisimulation for labelled Markov processes and compare it with earlier definitions of state and event bisimulation, which both only compare states. In contrast to those state-based bisimulations, our distribution bisimulation is weaker, but corresponds more closely to linear properties. We construct a logic and a metric to describe our distribution bisimulation and discuss linearity, continuity and compositional properties.Comment: Accepted by FORMATS 201

arXiv.org e-Print Archive

Crossref

Computing Probabilistic Bisimilarity Distances for Probabilistic Automata

Author: Bacci Giorgio
Bacci Giovanni
Larsen Kim G.
Mardare Radu
Tang Qiyi
van Breugel Franck
Publication venue
Publication date: 01/01/2021
Field of study

The probabilistic bisimilarity distance of Deng et al. has been proposed as a robust quantitative generalization of Segala and Lynch's probabilistic bisimilarity for probabilistic automata. In this paper, we present a characterization of the bisimilarity distance as the solution of a simple stochastic game. The characterization gives us an algorithm to compute the distances by applying Condon's simple policy iteration on these games. The correctness of Condon's approach, however, relies on the assumption that the games are stopping. Our games may be non-stopping in general, yet we are able to prove termination for this extended class of games. Already other algorithms have been proposed in the literature to compute these distances, with complexity in

\textbf{UP} \cap \textbf{coUP}

and \textbf{PPAD}. Despite the theoretical relevance, these algorithms are inefficient in practice. To the best of our knowledge, our algorithm is the first practical solution. The characterization of the probabilistic bisimilarity distance mentioned above crucially uses a dual presentation of the Hausdorff distance due to M\'emoli. As an additional contribution, in this paper we show that M\'emoli's result can be used also to prove that the bisimilarity distance bounds the difference in the maximal (or minimal) probability of two states to satisfying arbitrary

\omega

-regular properties, expressed, eg., as LTL formulas

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository

Episciences.org

Directory of Open Access Journals

Oxford University Research Archive

VBN