5,548 research outputs found

    A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications

    Full text link
    We propose to synthesize a control policy for a Markov decision process (MDP) such that the resulting traces of the MDP satisfy a linear temporal logic (LTL) property. We construct a product MDP that incorporates a deterministic Rabin automaton generated from the desired LTL property. The reward function of the product MDP is defined from the acceptance condition of the Rabin automaton. This construction allows us to apply techniques from learning theory to the problem of synthesis for LTL specifications even when the transition probabilities are not known a priori. We prove that our method is guaranteed to find a controller that satisfies the LTL property with probability one if such a policy exists, and we suggest empirically with a case study in traffic control that our method produces reasonable control strategies even when the LTL property cannot be satisfied with probability one

    Barrier Functions for Multiagent-POMDPs with DTL Specifications

    Get PDF
    Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfaction of high-level specifications in terms of linear distribution temporal logic (LDTL). To this end, we use sufficient and necessary conditions for the invariance of a given set based on discrete-time barrier functions (DTBFs) and formulate sufficient conditions for finite time DTBF to study finite time convergence to a set. We then show that different LDTL mission/safety specifications can be cast as a set of invariance or finite time reachability problems. We demonstrate that the proposed method for safety-shield synthesis can be implemented online by a sequence of one-step greedy algorithms. We demonstrate the efficacy of the proposed method using experiments involving a team of robots

    Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints

    Full text link
    We consider synthesis of control policies that maximize the probability of satisfying given temporal logic specifications in unknown, stochastic environments. We model the interaction between the system and its environment as a Markov decision process (MDP) with initially unknown transition probabilities. The solution we develop builds on the so-called model-based probably approximately correct Markov decision process (PAC-MDP) methodology. The algorithm attains an ε\varepsilon-approximately optimal policy with probability 1−δ1-\delta using samples (i.e. observations), time and space that grow polynomially with the size of the MDP, the size of the automaton expressing the temporal logic specification, 1ε\frac{1}{\varepsilon}, 1δ\frac{1}{\delta} and a finite time horizon. In this approach, the system maintains a model of the initially unknown MDP, and constructs a product MDP based on its learned model and the specification automaton that expresses the temporal logic constraints. During execution, the policy is iteratively updated using observation of the transitions taken by the system. The iteration terminates in finitely many steps. With high probability, the resulting policy is such that, for any state, the difference between the probability of satisfying the specification under this policy and the optimal one is within a predefined bound.Comment: 9 pages, 5 figures, Accepted by 2014 Robotics: Science and Systems (RSS

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Probabilistic Plan Synthesis for Coupled Multi-Agent Systems

    Full text link
    This paper presents a fully automated procedure for controller synthesis for multi-agent systems under the presence of uncertainties. We model the motion of each of the NN agents in the environment as a Markov Decision Process (MDP) and we assign to each agent one individual high-level formula given in Probabilistic Computational Tree Logic (PCTL). Each agent may need to collaborate with other agents in order to achieve a task. The collaboration is imposed by sharing actions between the agents. We aim to design local control policies such that each agent satisfies its individual PCTL formula. The proposed algorithm builds on clustering the agents, MDP products construction and controller policies design. We show that our approach has better computational complexity than the centralized case, which traditionally suffers from very high computational demands.Comment: IFAC WC 2017, Toulouse, Franc
    • …
    corecore