58 research outputs found

    Algorithms for stochastic finite memory control of partially observable systems

    Get PDF
    A partially observable Markov decision process (POMDP) is a mathematical framework for planning and control problems in which actions have stochastic effects and observations provide uncertain state information. It is widely used for research in decision-theoretic planning and reinforcement learning. % To cope with partial observability, a policy (or plan) must use memory, and previous work has shown that a finite-state controller provides a good policy representation. This thesis considers a previously-developed bounded policy iteration algorithm for POMDPs that finds policies that take the form of stochastic finite-state controllers. Two new improvements of this algorithm are developed. First improvement provides a simplification of the basic linear program, which is used to find improved controllers. This results in a considerable speed-up in efficiency of the original algorithm. Secondly, a branch and bound algorithm for adding the best possible node to the controller is presented, which provides an error bound and a test for global optimality. Experimental results show that these enhancements significantly improve the algorithm\u27s performance

    Safe Multi-objective Planning with a Posteriori Preferences

    Get PDF
    Autonomous planning in safety critical systems is a difficult task where decisions must carefully balance optimisation for performance goals of the system while also keeping the system away from safety hazards. These tasks often conflict, and hence present a challenging multi-objective planning problem where at least one of the objectives relates to safety risk. Recasting safety risk into an objective introduces additional requirements on planning algorithms: safety risk cannot be "averaged out" nor can it be combined with other objectives without loss of information and losing its intended purpose as a tool in risk reduction. Thus, existing algorithms for multi-objective planning cannot be used directly as they do not provide any facility to accurately track and update safety risk. A common work around is to restrict available decisions to those guaranteed safe a priori, but this can be overly conservative and hamper performance significantly. In this paper, we propose a planning algorithm based on multiobjective Monte-Carlo Tree Search to resolve these problems by recognising safety risk as a first class objective. Our algorithm explicitly models the safety of the system separately from the performance of the system, uses safety risk to both optimise and provide constraints for safety in the planning process, and uses an ALARP-based preference selection method to choose an appropriate safe plan from its output. The preference selection method chooses from the set of multiple safe plans to weigh risk against performance. We demonstrate the behaviour of the algorithm using an example representative of safety critical decision-making

    Strengthening Deterministic Policies for POMDPs

    Full text link
    The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks
    • …
    corecore