3,540 research outputs found

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Stick-Breaking Policy Learning in Dec-POMDPs

    Get PDF
    Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods

    Real-time support for high performance aircraft operation

    Get PDF
    The feasibility of real-time processing schemes using artificial neural networks (ANNs) is investigated. A rationale for digital neural nets is presented and a general processor architecture for control applications is illustrated. Research results on ANN structures for real-time applications are given. Research results on ANN algorithms for real-time control are also shown

    Approximate performability and dependability analysis using generalized stochastic Petri Nets

    Get PDF
    Since current day fault-tolerant and distributed computer and communication systems tend to be large and complex, their corresponding performability models will suffer from the same characteristics. Therefore, calculating performability measures from these models is a difficult and time-consuming task.\ud \ud To alleviate the largeness and complexity problem to some extent we use generalized stochastic Petri nets to describe to models and to automatically generate the underlying Markov reward models. Still however, many models cannot be solved with the current numerical techniques, although they are conveniently and often compactly described.\ud \ud In this paper we discuss two heuristic state space truncation techniques that allow us to obtain very good approximations for the steady-state performability while only assessing a few percent of the states of the untruncated model. For a class of reversible models we derive explicit lower and upper bounds on the exact steady-state performability. For a much wider class of models a truncation theorem exists that allows one to obtain bounds for the error made in the truncation. We discuss this theorem in the context of approximate performability models and comment on its applicability. For all the proposed truncation techniques we present examples showing their usefulness
    corecore