58 research outputs found
Algorithms for stochastic finite memory control of partially observable systems
A partially observable Markov decision process (POMDP) is a mathematical framework for planning and control problems in which actions have stochastic effects and observations provide uncertain state information. It is widely used for research in decision-theoretic planning and reinforcement learning. % To cope with partial observability, a policy (or plan) must use memory, and previous work has shown that a finite-state controller provides a good policy representation. This thesis considers a previously-developed bounded policy iteration algorithm for POMDPs that finds policies that take the form of stochastic finite-state controllers. Two new improvements of this algorithm are developed. First improvement provides a simplification of the basic linear program, which is used to find improved controllers. This results in a considerable speed-up in efficiency of the original algorithm. Secondly, a branch and bound algorithm for adding the best possible node to the controller is presented, which provides an error bound and a test for global optimality. Experimental results show that these enhancements significantly improve the algorithm\u27s performance
Safe Multi-objective Planning with a Posteriori Preferences
Autonomous planning in safety critical systems is a difficult task where decisions must carefully balance optimisation for performance goals of the system while also keeping the system away from safety hazards. These tasks often conflict, and hence present a challenging multi-objective planning problem where at least one of the objectives relates to safety risk. Recasting safety risk into an objective introduces additional requirements on planning algorithms: safety risk cannot be "averaged out" nor can it be combined with other objectives without loss of information and losing its intended purpose as a tool in risk reduction. Thus, existing algorithms for multi-objective planning cannot be used directly as they do not provide any facility to accurately track and update safety risk. A common work around is to restrict available decisions to those guaranteed safe a priori, but this can be overly conservative and hamper performance significantly. In this paper, we propose a planning algorithm based on multiobjective Monte-Carlo Tree Search to resolve these problems by recognising safety risk as a first class objective. Our algorithm explicitly models the safety of the system separately from the performance of the system, uses safety risk to both optimise and provide constraints for safety in the planning process, and uses an ALARP-based preference selection method to choose an appropriate safe plan from its output. The preference selection method chooses from the set of multiple safe plans to weigh risk against performance. We demonstrate the behaviour of the algorithm using an example representative of safety critical decision-making
Strengthening Deterministic Policies for POMDPs
The synthesis problem for partially observable Markov decision processes
(POMDPs) is to compute a policy that satisfies a given specification. Such
policies have to take the full execution history of a POMDP into account,
rendering the problem undecidable in general. A common approach is to use a
limited amount of memory and randomize over potential choices. Yet, this
problem is still NP-hard and often computationally intractable in practice. A
restricted problem is to use neither history nor randomization, yielding
policies that are called stationary and deterministic. Previous approaches to
compute such policies employ mixed-integer linear programming (MILP). We
provide a novel MILP encoding that supports sophisticated specifications in the
form of temporal logic constraints. It is able to handle an arbitrary number of
such specifications. Yet, randomization and memory are often mandatory to
achieve satisfactory policies. First, we extend our encoding to deliver a
restricted class of randomized policies. Second, based on the results of the
original MILP, we employ a preprocessing of the POMDP to encompass memory-based
decisions. The advantages of our approach over state-of-the-art POMDP solvers
lie (1) in the flexibility to strengthen simple deterministic policies without
losing computational tractability and (2) in the ability to enforce the
provable satisfaction of arbitrarily many specifications. The latter point
allows taking trade-offs between performance and safety aspects of typical
POMDP examples into account. We show the effectiveness of our method on a broad
range of benchmarks
Recommended from our members
Abstractions in Reasoning for Long-Term Autonomy
The path to building adaptive, robust, intelligent agents has led researchers to develop a suite of powerful models and algorithms for agents with a single objective. However, in recent years, attempts to use this monolithic approach to solve an ever-expanding set of complex real-world problems, which increasingly include long-term autonomous deployments, have illuminated challenges in its ability to scale. Consequently, a fragmented collection of hierarchical and multi-objective models were developed. This trend continues into the algorithms as well, as each approximates an optimal solution in a different manner for scalability. These models and algorithms represent an attempt to solve pieces of an overarching problem: how can an agent explicitly model and integrate the necessary aspects of reasoning required to achieve long-term autonomy?
This thesis presents a general hierarchical and multi-objective model called a policy network that unifies prior fragmented solutions into a single graphical decision-making structure. Policy networks are broadly useful to solve numerous real-world problems. This thesis focuses on autonomous vehicle (AV) problems: (1) route-planning with multiple objectives; (2) semi-autonomy with proactive transfer of control; and (3) intersection decision-making for reasoning online about any number of other vehicles and pedestrians. Formal models are presented for each of the distinct problems. Solutions are evaluated using real-world map data in simulation and demonstrated on a fully operational AV prototype driving on real public roads. Policy networks serve as a shared underlying framework for all three, enabling their seamless integration as parts of an overall solution for rich, real-world, scalable decision-making in agents with long-term autonomy
Recommended from our members
Reliable Decision-Making with Imprecise Models
The rapid growth in the deployment of autonomous systems across various sectors has generated considerable interest in how these systems can operate reliably in large, stochastic, and unstructured environments. Despite recent advances in artificial intelligence and machine learning, it is challenging to assure that autonomous systems will operate reliably in the open world. One of the causes of unreliable behavior is the impreciseness of the model used for decision-making. Due to the practical challenges in data collection and precise model specification, autonomous systems often operate based on models that do not represent all the details in the environment. Even if the system has access to a comprehensive decision-making model that accounts for all the details in the environment and all possible scenarios the agent may encounter, it may be intractable to solve this complex model optimally. Consequently, this complex, high fidelity model may be simplified to accelerate planning, introducing imprecision. Reasoning with such imprecise models affects the reliability of autonomous systems. A system\u27s actions may sometimes produce unexpected, undesirable consequences, which are often identified after deployment. How can we design autonomous systems that can operate reliably in the presence of uncertainty and model imprecision?
This dissertation presents solutions to address three classes of model imprecision in a Markov decision process, along with an analysis of the conditions under which bounded-performance can be guaranteed. First, an adaptive outcome selection approach is introduced to devise risk-aware reduced models of the environment that efficiently balance the trade-off between model simplicity and fidelity, to accelerate planning in resource-constrained settings. Second, a framework that extends stochastic shortest path framework to problems with imperfect information about the goal state during planning is introduced, along with two solution approaches to solve this problem. Finally, two complementary solution approaches are presented to minimize the negative side effects of agent actions. The techniques presented in this dissertation enable an autonomous system to detect and mitigate undesirable behavior, without redesigning the model entirely
Recommended from our members
Metareasoning for Planning and Execution in Autonomous Systems
Metareasoning is the process by which an autonomous system optimizes, specifically monitors and controls, its own planning and execution processes in order to operate more effectively in its environment. As autonomous systems rapidly grow in sophistication and autonomy, the need for metareasoning has become critical for efficient and reliable operation in noisy, stochastic, unstructured domains for long periods of time. This is due to the uncertainty over the limitations of their reasoning capabilities and the range of their potential circumstances. However, despite considerable progress in metareasoning as a whole over the last thirty years, work on metareasoning for planning relies on several assumptions that diminish its accuracy and practical utility in autonomous systems that operate in the real world while work on metareasoning for execution has not seen much attention yet. This dissertation therefore proposes more effective metareasoning for planning while expanding the scope of metareasoning to execution to improve the efficiency of planning and the reliability of execution in autonomous systems.
In particular, we offer a two-pronged framework that introduces metareasoning for efficient planning and reliable execution in autonomous systems. We begin by proposing two forms of metareasoning for efficient planning: (1) a method that determines when to interrupt an anytime algorithm and act on the current solution by using online performance prediction and (2) a method that tunes the hyperparameters of the anytime algorithm at runtime by using deep reinforcement learning. We then propose two forms of metareasoning for reliable execution: (3) a method that recovers from exceptions that can be encountered during operation by using belief space planning and (4) a method that maintains and restores safety during operation by using probabilistic planning
- …