503 research outputs found
Global Motion Planning under Uncertain Motion, Sensing, and Environment Map
Motion planning that takes into account uncertainty in motion, sensing, and environment map, is critical for autonomous robots to operate reliably in our living spaces. Partially Observable Markov Decision Processes (POMDPs) is a principled and general framework for planning under uncertainty. Although recent development of point-based POMDPs have drastically increased the speed of POMDP planning, even the best POMDP planner today, fails to generate reasonable motion strategies when the environment map is not known exactly. This paper presents Guided Cluster Sampling (GCS), a new point-based POMDP planner for motion planning under uncertain motion, sensing, and environment map, when the robot has active sensing capability. It uses our observations that in this problem, the belief space B can be partitioned into a collection of much smaller subspaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. GCS samples B using two-stage cluster sampling, a subspace is sampled from the collection and then a belief is sampled from the subspace. It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Preliminary results suggest that GCS generates reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDP planner today, within reasonable time. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, GCS converges to the optimal policy, given enough time. To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space
Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments
We present a novel approach for efficient and reliable goal-directed
long-horizon navigation for a multi-robot team in a structured, unknown
environment by predicting statistics of unknown space. Building on recent work
in learning-augmented model based planning under uncertainty, we introduce a
high-level state and action abstraction that lets us approximate the
challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning
over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of
regions more likely to reach the unseen goal. We demonstrate improvement in
cost against other multi-robot strategies; in simulated office-like
environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3
robot) average cost versus standard non-learned optimistic planning and a
learning-informed baseline.Comment: 7 pages, 7 figures, ICRA202
Data-Efficient Policy Selection for Navigation in Partial Maps via Subgoal-Based Abstraction
We present a novel approach for fast and reliable policy selection for
navigation in partial maps. Leveraging the recent learning-augmented
model-based Learning over Subgoals Planning (LSP) abstraction to plan, our
robot reuses data collected during navigation to evaluate how well other
alternative policies could have performed via a procedure we call offline
alt-policy replay. Costs from offline alt-policy replay constrain policy
selection among the LSP-based policies during deployment, allowing for
improvements in convergence speed, cumulative regret and average navigation
cost. With only limited prior knowledge about the nature of unseen
environments, we achieve at least 67% and as much as 96% improvements on
cumulative regret over the baseline bandit approach in our experiments in
simulated maze and office-like environments.Comment: 8 pages, 5 figure
Barrier Functions for Multiagent-POMDPs with DTL Specifications
Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfaction of high-level specifications in terms of linear distribution temporal logic (LDTL). To this end, we use sufficient and necessary conditions for the invariance of a given set based on discrete-time barrier functions (DTBFs) and formulate sufficient conditions for finite time DTBF to study finite time convergence to a set. We then show that different LDTL mission/safety specifications can be cast as a set of invariance or finite time reachability problems. We demonstrate that the proposed method for safety-shield synthesis can be implemented online by a sequence of one-step greedy algorithms. We demonstrate the efficacy of the proposed method using experiments involving a team of robots
- …