Search CORE

9,860 research outputs found

Approximate performability and dependability analysis using generalized stochastic Petri Nets

Author: Haverkort Boudewijn R.
Publication venue: North-Holland
Publication date: 01/01/1992
Field of study

Since current day fault-tolerant and distributed computer and communication systems tend to be large and complex, their corresponding performability models will suffer from the same characteristics. Therefore, calculating performability measures from these models is a difficult and time-consuming task.\ud \ud To alleviate the largeness and complexity problem to some extent we use generalized stochastic Petri nets to describe to models and to automatically generate the underlying Markov reward models. Still however, many models cannot be solved with the current numerical techniques, although they are conveniently and often compactly described.\ud \ud In this paper we discuss two heuristic state space truncation techniques that allow us to obtain very good approximations for the steady-state performability while only assessing a few percent of the states of the untruncated model. For a class of reversible models we derive explicit lower and upper bounds on the exact steady-state performability. For a much wider class of models a truncation theorem exists that allows one to obtain bounds for the error made in the truncation. We discuss this theorem in the context of approximate performability models and comment on its applicability. For all the proposed truncation techniques we present examples showing their usefulness

CiteSeerX

University of Twente Research Information

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Author: Křetínský Jan
Meggendorfer Tobias
Publication venue
Publication date: 07/09/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages

arXiv.org e-Print Archive

Lancaster E-Prints

Deep Q-Learning for Nash Equilibria: Nash-DQN

Author: Casgrain Philippe
Jaimungal Sebastian
Ning Brian
Publication venue
Publication date: 23/04/2019
Field of study

Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.Comment: 16 pages, 4 figure

arXiv.org e-Print Archive

Differentiable Algorithm Networks for Composable Robot Learning

Author: Hsu David
Kaelbling Leslie Pack
Karkus Peter
Lee Wee Sun
Lozano-Perez Tomas
Ma Xiao
Publication venue
Publication date: 28/05/2019
Field of study

This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms, in order to achieve the best overall system performance. We illustrate the DAN methodology through a case study on a simulated robot system, which learns to navigate in complex 3-D environments with only local visual observations and an image of a partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at https://youtu.be/4jcYlTSJF4

arXiv.org e-Print Archive

DSpace@MIT