152 research outputs found

    LTLf/LDLf Non-Markovian Rewards

    Get PDF
    In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees

    Probabilistic planning with formal performance guarantees for mobile service robots

    Get PDF
    We present a framework for mobile service robot task planning and execution, based on the use of probabilistic verification techniques for the generation of optimal policies with attached formal performance guarantees. Our approach is based on a Markov decision process model of the robot in its environment, encompassing a topological map where nodes represent relevant locations in the environment, and a range of tasks that can be executed in different locations. The navigation in the topological map is modeled stochastically for a specific time of day. This is done by using spatio-temporal models that provide, for a given time of day, the probability of successfully navigating between two topological nodes, and the expected time to do so. We then present a methodology to generate cost optimal policies for tasks specified in co-safe linear temporal logic. Our key contribution is to address scenarios in which the task may not be achievable with probability one. We introduce a task progression function and present an approach to generate policies that are formally guaranteed to, in decreasing order of priority: maximize the probability of finishing the task; maximize progress towards completion, if this is not possible; and minimize the expected time or cost required. We illustrate and evaluate our approach with a scalability evaluation in a simulated scenario, and report on its implementation in a robot performing service tasks in an office environment for long periods of time

    Service composition in stochastic settings

    Get PDF
    With the growth of the Internet-of-Things and online Web services, more services with more capabilities are available to us. The ability to generate new, more useful services from existing ones has been the focus of much research for over a decade. The goal is, given a specification of the behavior of the target service, to build a controller, known as an orchestrator, that uses existing services to satisfy the requirements of the target service. The model of services and requirements used in most work is that of a finite state machine. This implies that the specification can either be satisfied or not, with no middle ground. This is a major drawback, since often an exact solution cannot be obtained. In this paper we study a simple stochastic model for service composition: we annotate the tar- get service with probabilities describing the likelihood of requesting each action in a state, and rewards for being able to execute actions. We show how to solve the resulting problem by solving a certain Markov Decision Process (MDP) derived from the service and requirement specifications. The solution to this MDP induces an orchestrator that coincides with the exact solution if a composition exists. Otherwise it provides an approximate solution that maximizes the expected sum of values of user requests that can be serviced. The model studied although simple shades light on composition in stochastic settings and indeed we discuss several possible extensions

    Designing Trustworthy Autonomous Systems

    Get PDF
    The design of autonomous systems is challenging and ensuring their trustworthiness can have different meanings, such as i) ensuring consistency and completeness of the requirements by a correct elicitation and formalization process; ii) ensuring that requirements are correctly mapped to system implementations so that any system behaviors never violate its requirements; iii) maximizing the reuse of available components and subsystems in order to cope with the design complexity; and iv) ensuring correct coordination of the system with its environment.Several techniques have been proposed over the years to cope with specific problems. However, a holistic design framework that, leveraging on existing tools and methodologies, practically helps the analysis and design of autonomous systems is still missing. This thesis explores the problem of building trustworthy autonomous systems from different angles. We have analyzed how current approaches of formal verification can provide assurances: 1) to the requirement corpora itself by formalizing requirements with assume/guarantee contracts to detect incompleteness and conflicts; 2) to the reward function used to then train the system so that the requirements do not get misinterpreted; 3) to the execution of the system by run-time monitoring and enforcing certain invariants; 4) to the coordination of the system with other external entities in a system of system scenario and 5) to system behaviors by automatically synthesize a policy which is correct

    An Integrated Control Framework for Long-Term Autonomy in Mobile Service Robots

    Get PDF

    Temporal Logic Monitoring Rewards via Transducers

    Get PDF
    In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning

    Formal methods paradigms for estimation and machine learning in dynamical systems

    Get PDF
    Formal methods are widely used in engineering to determine whether a system exhibits a certain property (verification) or to design controllers that are guaranteed to drive the system to achieve a certain property (synthesis). Most existing techniques require a large amount of accurate information about the system in order to be successful. The methods presented in this work can operate with significantly less prior information. In the domain of formal synthesis for robotics, the assumptions of perfect sensing and perfect knowledge of system dynamics are unrealistic. To address this issue, we present control algorithms that use active estimation and reinforcement learning to mitigate the effects of uncertainty. In the domain of cyber-physical system analysis, we relax the assumption that the system model is known and identify system properties automatically from execution data. First, we address the problem of planning the path of a robot under temporal logic constraints (e.g. "avoid obstacles and periodically visit a recharging station") while simultaneously minimizing the uncertainty about the state of an unknown feature of the environment (e.g. locations of fires after a natural disaster). We present synthesis algorithms and evaluate them via simulation and experiments with aerial robots. Second, we develop a new specification language for tasks that require gathering information about and interacting with a partially observable environment, e.g. "Maintain localization error below a certain level while also avoiding obstacles.'' Third, we consider learning temporal logic properties of a dynamical system from a finite set of system outputs. For example, given maritime surveillance data we wish to find the specification that corresponds only to those vessels that are deemed law-abiding. Algorithms for performing off-line supervised and unsupervised learning and on-line supervised learning are presented. Finally, we consider the case in which we want to steer a system with unknown dynamics to satisfy a given temporal logic specification. We present a novel reinforcement learning paradigm to solve this problem. Our procedure gives "partial credit'' for executions that almost satisfy the specification, which can lead to faster convergence rates and produce better solutions when the specification is not satisfiable

    Formal Methods for Autonomous Systems

    Full text link
    Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification

    Verified multi-robot planning under uncertainty

    Get PDF
    Multi-robot systems are being increasingly deployed to solve real-world problems, from warehouses to autonomous fleets for logistics, from hospitals to nuclear power plants and emergency search and rescue scenarios. These systems often need to operate in uncertain environments which can lead to robot failure, uncertain action durations or the inability to complete assigned tasks. In many scenarios, the safety or reliability of these systems is critical to their deployment. Therefore there is a need for robust multi-robot planning solutions that offer guarantees on the performance of the robot team. In this thesis we develop techniques for robust multi-robot task allocation and planning under uncertainty by building on techniques from formal verification. We present three algorithms that solve the problem of task allocation and planning for a multi-robot team operating under uncertainty. These algorithms are able to calculate the expected maximum number of tasks the multi-robot team can achieve, considering the possibility of robot failure. They are also able to reallocate tasks when robots fail. We formalise the problem of task allocation and robust planning for a multi-robot team using Linear Temporal Logic to specify the team's mission and Markov decision processes to model the robots. Our first solution method is a sampling based approach to simultaneous task allocation and planning. Our second solution method separates task allocation and planning for the same problem using auctioning for the former. Our final solution lies midway between the first two using simultaneous task allocation and planning in a sequential team model. We evaluate all solution approaches extensively using a set of tests inspired by existing benchmarks in related fields with a focus on scalability
    • …
    corecore