4 research outputs found

    Computing Probabilistic Bisimilarity Distances via Policy Iteration

    Get PDF
    A transformation mapping a labelled Markov chain to a simple stochastic game is presented. In the resulting simple stochastic game, each vertex corresponds to a pair of states of the labelled Markov chain. The value of a vertex of the simple stochastic game is shown to be equal to the probabilistic bisimilarity distance, a notion due to Desharnais, Gupta, Jagadeesan and Panangaden, of the corresponding pair of states of the labelled Markov chain. Bacci, Bacci, Larsen and Mardare introduced an algorithm to compute the probabilistic bisimilarity distances for a labelled Markov chain. A modification of a basic version of their algorithm for a labelled Markov chain is shown to be the policy iteration algorithm applied to the corresponding simple stochastic game. Furthermore, it is shown that this algorithm takes exponential time in the worst case

    Learning to Control in Metric Space with Optimal Regret

    Full text link
    We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after KK episodes is O(HL(KH)d−1d)O(HL(KH)^{\frac{d-1}{d}}) where LL is a smoothness parameter, and dd is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret

    Distribution-based bisimulation for labelled Markov processes

    Full text link
    In this paper we propose a (sub)distribution-based bisimulation for labelled Markov processes and compare it with earlier definitions of state and event bisimulation, which both only compare states. In contrast to those state-based bisimulations, our distribution bisimulation is weaker, but corresponds more closely to linear properties. We construct a logic and a metric to describe our distribution bisimulation and discuss linearity, continuity and compositional properties.Comment: Accepted by FORMATS 201

    Computing Probabilistic Bisimilarity Distances for Probabilistic Automata

    Get PDF
    The probabilistic bisimilarity distance of Deng et al. has been proposed as a robust quantitative generalization of Segala and Lynch's probabilistic bisimilarity for probabilistic automata. In this paper, we present a characterization of the bisimilarity distance as the solution of a simple stochastic game. The characterization gives us an algorithm to compute the distances by applying Condon's simple policy iteration on these games. The correctness of Condon's approach, however, relies on the assumption that the games are stopping. Our games may be non-stopping in general, yet we are able to prove termination for this extended class of games. Already other algorithms have been proposed in the literature to compute these distances, with complexity in UP∩coUP\textbf{UP} \cap \textbf{coUP} and \textbf{PPAD}. Despite the theoretical relevance, these algorithms are inefficient in practice. To the best of our knowledge, our algorithm is the first practical solution. The characterization of the probabilistic bisimilarity distance mentioned above crucially uses a dual presentation of the Hausdorff distance due to M\'emoli. As an additional contribution, in this paper we show that M\'emoli's result can be used also to prove that the bisimilarity distance bounds the difference in the maximal (or minimal) probability of two states to satisfying arbitrary ω\omega-regular properties, expressed, eg., as LTL formulas
    corecore