15 research outputs found

    Computing Distances between Probabilistic Automata

    Full text link
    We present relaxed notions of simulation and bisimulation on Probabilistic Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve the usual notions of bisimulation and simulation on PAs. We give logical characterisations of these notions by choosing suitable logics which differ from the elementary ones, L with negation and L without negation, by the modal operator. Using flow networks, we show how to compute the relations in PTIME. This allows the definition of an efficiently computable non-discounted distance between the states of a PA. A natural modification of this distance is introduced, to obtain a discounted distance, which weakens the influence of long term transitions. We compare our notions of distance to others previously defined and illustrate our approach on various examples. We also show that our distance is not expansive with respect to process algebra operators. Although L without negation is a suitable logic to characterise epsilon-(bi)simulation on deterministic PAs, it is not for general PAs; interestingly, we prove that it does characterise weaker notions, called a priori epsilon-(bi)simulation, which we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074

    Solving Factored MDPs with Hybrid State and Action Variables

    Full text link
    Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems

    Metrics and continuity in reinforcement learning

    Full text link
    In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.Comment: Accepted at AAAI 202

    Optimizing the Age-of-Information for Mobile Users in Adversarial and Stochastic Environments

    Full text link
    We study a multi-user downlink scheduling problem for optimizing the freshness of information available to users roaming across multiple cells. We consider both adversarial and stochastic settings and design scheduling policies that optimize two distinct information freshness metrics, namely the average age-of-information and the peak age-of-information. We show that a natural greedy scheduling policy is competitive with the optimal offline policy in the adversarial setting. We also derive fundamental lower bounds to the competitive ratio achievable by any online policy. In the stochastic environment, we show that a Max-Weight scheduling policy that takes into account the channel statistics achieves an approximation factor of 22 for minimizing the average age of information in two extreme mobility scenarios. We conclude the paper by establishing a large-deviation optimality result achieved by the greedy policy for minimizing the peak age of information for static users situated at a single cell.Comment: arXiv admin note: text overlap with arXiv:2001.0547

    A taxonomy for similarity metrics between Markov decision processes

    Get PDF
    Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the knowledge acquired in the learning of a set of source tasks to a new learning process in a target task, assuming that the target and source tasks are close enough. In recent years, transfer learning has succeeded in making reinforcement learning (RL) algorithms more efficient (e.g., by reducing the number of samples needed to achieve (near-)optimal performance). Transfer in RL is based on the core concept of similarity: whenever the tasks are similar, the transferred knowledge can be reused to solve the target task and significantly improve the learning performance. Therefore, the selection of good metrics to measure these similarities is a critical aspect when building transfer RL algorithms, especially when this knowledge is transferred from simulation to the real world. In the literature, there are many metrics to measure the similarity between MDPs, hence, many definitions of similarity or its complement distance have been considered. In this paper, we propose a categorization of these metrics and analyze the definitions of similarity proposed so far, taking into account such categorization. We also follow this taxonomy to survey the existing literature, as well as suggesting future directions for the construction of new metricsOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has also been supported by the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with UC3M in the line of Excellence of University Professors (EPUC3M17), and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)S
    corecore