Search CORE

26 research outputs found

Safe POMDP Online Planning via Shielding

Author: Feng Lu
Parker David
Sheng Shili
Publication venue
Publication date: 02/03/2024
Field of study

Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning

arXiv.org e-Print Archive

Safe POMDP online planning via shielding

Author: Feng Lu
Parker David
Sheng Shili
Publication venue
Publication date: 05/03/2024
Field of study

Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable MonteCarlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almostsure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning

Oxford University Research Archive

Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

Author: Alyassi Rashid
Dias Jorge
Hong Sungkweon
Huang Xin
Karapetyan Areg
Khonji Majid
Merkt Wolfgang
Williams Brian
Publication venue
Publication date: 03/10/2022
Field of study

In transportation networks, where traffic lights have traditionally been used for vehicle coordination, intersections act as natural bottlenecks. A formidable challenge for existing automated intersections lies in detecting and reasoning about uncertainty from the operating environment and human-driven vehicles. In this paper, we propose a risk-aware intelligent intersection system for autonomous vehicles (AVs) as well as human-driven vehicles (HVs). We cast the problem as a novel class of Multi-agent Chance-Constrained Stochastic Shortest Path (MCC-SSP) problems and devise an exact Integer Linear Programming (ILP) formulation that is scalable in the number of agents' interaction points (e.g., potential collision points at the intersection). In particular, when the number of agents within an interaction point is small, which is often the case in intersections, the ILP has a polynomial number of variables and constraints. To further improve the running time performance, we show that the collision risk computation can be performed offline. Additionally, a trajectory optimization workflow is provided to generate risk-aware trajectories for any given intersection. The proposed framework is implemented in CARLA simulator and evaluated under a fully autonomous intersection with AVs only as well as in a hybrid setup with a signalized intersection for HVs and an intelligent scheme for AVs. As verified via simulations, the featured approach improves intersection's efficiency by up to

200\%

while also conforming to the specified tunable risk threshold

arXiv.org e-Print Archive

Increasing the Value of Information During Planning in Uncertain Environments

Author: Pokharel Gaurab
Publication venue: Digital Commons at Oberlin
Publication date: 01/01/2021
Field of study

Prior studies have demonstrated that for many real-world problems, POMDPs can be solved through online algorithms both quickly and with near optimality [10, 8, 6]. However, on an important set of problems where there is a large time delay between when the agent can gather information and when it needs to use that information, these solutions fail to adequately consider the value of information. As a result, information gathering actions, even when they are critical in the optimal policy, will be ignored by existing solutions, leading to sub-optimal decisions by the agent. In this research, we develop a novel solution that rectifies this problem by introducing a new algorithm that improves upon state-of-the-art online planning by better reflecting on the value of actions that gather information. We do this by adding Entropy to the UCB1 heuristic in the POMCP algorithm. We test this solution on the hallway problem. Results indicate that our new algorithm performs significantly better than POMCP

Digital Commons at Oberlin (Oberlin College)

Formal models and algorithms for decentralized decision making under uncertainty

Author: C.H. Papadimitriou
C.H. Papadimitriou
C.H. Papadimitriou
C.V. Goldman
C.V. Goldman
D.P. Farias de
D.S. Bernstein
D.V. Pynadath
E. Kalai
I. Suzuki
K.J. Aström
L.P. Kaelbling
M. Tambe
M.J. Osborne
M.L. Puterman
O. Madani
P.J. Gmytrasiewicz
S. Russell
Shlomo Zilberstein
Sven Seuken
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Balancing exploration and exploitation: task-targeted exploration for scientific decision-making

Author: Flaspohler Genevieve Elaine
Publication venue: 'MBLWHOI Library'
Publication date: 01/09/2022
Field of study

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2022.How do we collect observational data that reveal fundamental properties of scientific phenomena? This is a key challenge in modern scientific discovery. Scientific phenomena are complex—they have high-dimensional and continuous state, exhibit chaotic dynamics, and generate noisy sensor observations. Additionally, scientific experimentation often requires significant time, money, and human effort. In the face of these challenges, we propose to leverage autonomous decision-making to augment and accelerate human scientific discovery. Autonomous decision-making in scientific domains faces an important and classical challenge: balancing exploration and exploitation when making decisions under uncertainty. This thesis argues that efficient decision-making in real-world, scientific domains requires task-targeted exploration—exploration strategies that are tuned to a specific task. By quantifying the change in task performance due to exploratory actions, we enable decision-makers that can contend with highly uncertain real-world environments, performing exploration parsimoniously to improve task performance. The thesis presents three novel paradigms for task-targeted exploration that are motivated by and applied to real-world scientific problems. We first consider exploration in partially observable Markov decision processes (POMDPs) and present two novel planners that leverage task-driven information measures to balance exploration and exploitation. These planners drive robots in simulation and oceanographic field trials to robustly identify plume sources and track targets with stochastic dynamics. We next consider the exploration- exploitation trade-off in online learning paradigms, a robust alternative to POMDPs when the environment is adversarial or difficult to model. We present novel online learning algorithms that balance exploitative and exploratory plays optimally under real-world constraints, including delayed feedback, partial predictability, and short regret horizons. We use these algorithms to perform model selection for subseasonal temperature and precipitation forecasting, achieving state-of-the-art forecasting accuracy. The human scientific endeavor is poised to benefit from our emerging capacity to integrate observational data into the process of model development and validation. Realizing the full potential of these data requires autonomous decision-makers that can contend with the inherent uncertainty of real-world scientific domains. This thesis highlights the critical role that task-targeted exploration plays in efficient scientific decision-making and proposes three novel methods to achieve task-targeted exploration in real-world oceanographic and climate science applications.This material is based upon work supported by the NSF Graduate Research Fellowship Program and a Microsoft Research PhD Fellowship, as well as the Department of Energy / National Nuclear Security Administration under Award Number DE-NA0003921, the Office of Naval Research under Award Number N00014-17-1-2072, and DARPA under Award Number HR001120C0033

DSpace@MIT

Woods Hole Open Access Server