62 research outputs found

    Stochastic Shortest Path with Energy Constraints in POMDPs

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

    Task-Guided Inverse Reinforcement Learning Under Partial Information

    Full text link
    We study the problem of inverse reinforcement learning (IRL), where the learning agent recovers a reward function using expert demonstrations. Most of the existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). The algorithm addresses several limitations of existing techniques that do not take the information asymmetry between the expert and the learner into account. First, it adopts causal entropy as the measure of the likelihood of the expert demonstrations as opposed to entropy in most existing IRL techniques, and avoids a common source of algorithmic complexity. Second, it incorporates task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations and may reduce the information asymmetry. Nevertheless, the resulting formulation is still nonconvex due to the intrinsic nonconvexity of the so-called forward problem, i.e., computing an optimal policy given a reward function, in POMDPs. We address this nonconvexity through sequential convex programming and introduce several extensions to solve the forward problem in a scalable manner. This scalability allows computing policies that incorporate memory at the expense of added computational cost yet also outperform memoryless policies. We demonstrate that, even with severely limited data, the algorithm learns reward functions and policies that satisfy the task and induce a similar behavior to the expert by leveraging the side information and incorporating memory into the policy.Comment: Initial submission to ICAPS 202

    Formal synthesis of partially-observable cyber-physical systems

    Get PDF
    This dissertation is motivated by the challenges arising in the synthesis of controllers for partially-observable cyber-physical systems (PO-CPSs). In the past decade, CPSs have become ubiquitous and an integral part of our daily lives. Examples of such systems range from autonomous vehicles, drones, and aircraft to robots and advanced manufacturing. In many applications, these systems are expected to do complex logic tasks. Such tasks can usually be expressed using temporal logic formulae or as (in)finite strings over finite automata. In the past few years, abstraction-based techniques have been very promising for the formal synthesis of controllers. Since these techniques are based on the discretization of state and input sets, when dealing with large-scale systems, unfortunately, they suffer severely from the curse of dimensionality (i.e., the computational complexity grows exponentially with the dimension of the state set). In order to overcome the large computa- tional burden, a discretization-free approach based on control barrier functions has shown great potential to solve formal synthesis problems. In this thesis, we provide a systematic approach to synthesize a hybrid control policy for partially-observable (stochastic) control systems without discretizing the state sets. In many real-life applications, full-state information is not always available (due to the cost of sensing or the unavailability of the measurements). Therefore, in this thesis, we consider partially-observable (stochastic) control systems. Given proper state estimators, our goal is to utilize a notion of control barrier functions to synthesize control policies that provide (and potentially maximize) a lower bound on the probability that the trajectories of the partially-observable (stochastic) control system satisfy complex logic specifications such as safety and those that can be expressed as deterministic finite automata (DFA). Two main approaches are presented in this thesis to construct control barrier functions. In the first approach, no prior knowledge of estimation accuracy is needed. The second approach utilizes a (probability) bound on the estimation accuracy. Though the synthesis procedure for lower-dimensional systems is challenging itself, the task is much more computationally expensive (if not impossible) for large-scale interconnected systems. To overcome the challenges encountered with large-scale systems, we develop approaches to reduce the computational complexity. In particular, by considering a large-scale partially-observable control system as an interconnection of lower-dimensional subsystems, we compute so-called local control barrier functions for subsystems along with the corresponding local controllers. By assuming some small-gain type conditions, we then utilize local control barrier functions of subsystems to compositionally construct an overall control barrier function for the interconnected system. Finally, since closed-form mathematical models of many physical systems are either unavailable or too complicated to be of any use, we also extend our work to the synthesis of safety controllers for partially-observable systems with unknown dynamics. To tackle this problem, we utilize a data-driven approach and construct control barrier functions and their corresponding controllers via sets of data collected from the output trajectories of the systems and the trajectories of the estimators. To demonstrate the effectiveness of the proposed results in the thesis, we consider various case studies, such as a DC motor, an adaptive cruise control (ACC) system consisting of vehicles in a platoon, and a Moore-Greitzer jet engine model

    Computational Techniques for Stochastic Reachability

    Get PDF
    As automated control systems grow in prevalence and complexity, there is an increasing demand for verification and controller synthesis methods to ensure these systems perform safely and to desired specifications. In addition, uncertain or stochastic behaviors are often exhibited (such as wind affecting the motion of an aircraft), making probabilistic verification desirable. Stochastic reachability analysis provides a formal means of generating the set of initial states that meets a given objective (such as safety or reachability) with a desired level of probability, known as the reachable (or safe) set, depending on the objective. However, the applicability of reachability analysis is limited in the scope and size of system it can address. First, generating stochastic reachable or viable sets is computationally intensive, and most existing methods rely on an optimal control formulation that requires solving a dynamic program, and which scales exponentially in the dimension of the state space. Second, almost no results exist for extending stochastic reachability analysis to systems with incomplete information, such that the controller does not have access to the full state of the system. This thesis addresses both of the above limitations, and introduces novel computational methods for generating stochastic reachable sets for both perfectly and partially observable systems. We initially consider a linear system with additive Gaussian noise, and introduce two methods for computing stochastic reachable sets that do not require dynamic programming. The first method uses a particle approximation to formulate a deterministic mixed integer linear program that produces an estimate to reachability probabilities. The second method uses a convex chance-constrained optimization problem to generate an under-approximation to the reachable set. Using these methods we are able to generate stochastic reachable sets for a four-dimensional spacecraft docking example in far less time than it would take had we used a dynamic program. We then focus on discrete time stochastic hybrid systems, which provide a flexible modeling framework for systems that exhibit mode-dependent behavior, and whose state space has both discrete and continuous components. We incorporate a stochastic observation process into the hybrid system model, and derive both theoretical and computational results for generating stochastic reachable sets subject to an observation process. The derivation of an information state allows us to recast the problem as one of perfect information, and we prove that solving a dynamic program over the information state is equivalent to solving the original problem. We then demonstrate that the dynamic program to solve the reachability problem for a partially observable stochastic hybrid system shares the same properties as for a partially observable Markov decision process (POMDP) with an additive cost function, and so we can exploit approximation strategies designed for POMDPs to solve the reachability problem. To do so, however, we first generate approximate representations of the information state and value function as either vectors or Gaussian mixtures, through a finite state approximation to the hybrid system or using a Gaussian mixture approximation to an indicator function defined over a convex region. For a system with linear dynamics and Gaussian measurement noise, we show that it exhibits special properties that do not require an approximation of the information state, which enables much more efficient computation of the reachable set. In all cases we provide convergence results and numerical examples

    Formal methods paradigms for estimation and machine learning in dynamical systems

    Get PDF
    Formal methods are widely used in engineering to determine whether a system exhibits a certain property (verification) or to design controllers that are guaranteed to drive the system to achieve a certain property (synthesis). Most existing techniques require a large amount of accurate information about the system in order to be successful. The methods presented in this work can operate with significantly less prior information. In the domain of formal synthesis for robotics, the assumptions of perfect sensing and perfect knowledge of system dynamics are unrealistic. To address this issue, we present control algorithms that use active estimation and reinforcement learning to mitigate the effects of uncertainty. In the domain of cyber-physical system analysis, we relax the assumption that the system model is known and identify system properties automatically from execution data. First, we address the problem of planning the path of a robot under temporal logic constraints (e.g. "avoid obstacles and periodically visit a recharging station") while simultaneously minimizing the uncertainty about the state of an unknown feature of the environment (e.g. locations of fires after a natural disaster). We present synthesis algorithms and evaluate them via simulation and experiments with aerial robots. Second, we develop a new specification language for tasks that require gathering information about and interacting with a partially observable environment, e.g. "Maintain localization error below a certain level while also avoiding obstacles.'' Third, we consider learning temporal logic properties of a dynamical system from a finite set of system outputs. For example, given maritime surveillance data we wish to find the specification that corresponds only to those vessels that are deemed law-abiding. Algorithms for performing off-line supervised and unsupervised learning and on-line supervised learning are presented. Finally, we consider the case in which we want to steer a system with unknown dynamics to satisfy a given temporal logic specification. We present a novel reinforcement learning paradigm to solve this problem. Our procedure gives "partial credit'' for executions that almost satisfy the specification, which can lead to faster convergence rates and produce better solutions when the specification is not satisfiable

    The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

    Full text link
    Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable. While SOTA algorithms use Recurrent Neural Networks to compress the observation-action history aiming to learn a sufficient statistic, they lack guarantees of success and can lead to sub-optimal policies. To overcome this, we propose the Wasserstein Belief Updater, an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function
    corecore