8,113 research outputs found
Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder
In this paper, we present a hierarchical path planning framework called SG-RL
(subgoal graphs-reinforcement learning), to plan rational paths for agents
maneuvering in continuous and uncertain environments. By "rational", we mean
(1) efficient path planning to eliminate first-move lags; (2) collision-free
and smooth for agents with kinematic constraints satisfied. SG-RL works in a
two-level manner. At the first level, SG-RL uses a geometric path-planning
method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract
paths, also called subgoal sequences. At the second level, SG-RL uses an RL
method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal
motion-planning policies which can generate kinematically feasible and
collision-free trajectories between adjacent subgoals. The first advantage of
the proposed method is that SSG can solve the limitations of sparse reward and
local minima trap for RL agents; thus, LSPI can be used to generate paths in
complex environments. The second advantage is that, when the environment
changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to
reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI
can deal with uncertainties by exploiting its generalization ability to handle
changes in environments. Simulation experiments in representative scenarios
demonstrate that, compared with existing methods, SG-RL can work well on
large-scale maps with relatively low action-switching frequencies and shorter
path lengths, and SG-RL can deal with small changes in environments. We further
demonstrate that the design of reward functions and the types of training
environments are important factors for learning feasible policies.Comment: 20 page
Large Language Models as General Pattern Machines
We observe that pre-trained large language models (LLMs) are capable of
autoregressively completing complex token sequences -- from arbitrary ones
procedurally generated by probabilistic context-free grammars (PCFG), to more
rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a
general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern
completion proficiency can be partially retained even when the sequences are
expressed using tokens randomly sampled from the vocabulary. These results
suggest that without any additional training, LLMs can serve as general
sequence modelers, driven by in-context learning. In this work, we investigate
how these zero-shot capabilities may be applied to problems in robotics -- from
extrapolating sequences of numbers that represent states over time to complete
simple motions, to least-to-most prompting of reward-conditioned trajectories
that can discover and represent closed-loop policies (e.g., a stabilizing
controller for CartPole). While difficult to deploy today for real systems due
to latency, context size limitations, and compute costs, the approach of using
LLMs to drive low-level control may provide an exciting glimpse into how the
patterns among words could be transferred to actions.Comment: 21 pages, 25 figures. To appear at Conference on Robot Learning
(CoRL) 202
Optimal Planning Modulo Theories
Planning for real-world applications requires algorithms and tools with the ability to handle the complexity such scenarios entail. However, meeting the needs of such applications poses substantial challenges, both representational and algorithmic. On the one hand, expressive languages are needed to build faithful models. On the other hand, efficient solving techniques that can support these languages need to be devised. A response to this challenge is underway, and the past few years witnessed a community effort towards more expressive languages, including decidable fragments of first-order theories. In this work we focus on planning with arithmetic theories and propose Optimal Planning Modulo Theories, a framework that attempts to provide efficient means of dealing with such problems. Leveraging generic Optimization Modulo Theories (OMT) solvers, we first present domain-specific encodings for optimal planning in complex logistic domains. We then present a more general, domain- independent formulation that allows to extend OMT planning to a broader class of well-studied numeric problems in planning. To the best of our knowledge, this is the first time OMT procedures are employed in domain-independent planning
Access Control Synthesis for Physical Spaces
Access-control requirements for physical spaces, like office buildings and
airports, are best formulated from a global viewpoint in terms of system-wide
requirements. For example, "there is an authorized path to exit the building
from every room." In contrast, individual access-control components, such as
doors and turnstiles, can only enforce local policies, specifying when the
component may open. In practice, the gap between the system-wide, global
requirements and the many local policies is bridged manually, which is tedious,
error-prone, and scales poorly.
We propose a framework to automatically synthesize local access control
policies from a set of global requirements for physical spaces. Our framework
consists of an expressive language to specify both global requirements and
physical spaces, and an algorithm for synthesizing local, attribute-based
policies from the global specification. We empirically demonstrate the
framework's effectiveness on three substantial case studies. The studies
demonstrate that access control synthesis is practical even for complex
physical spaces, such as airports, with many interrelated security
requirements
Learning Symbolic Operators for Task and Motion Planning
Robotic planning problems in hybrid state and action spaces can be solved by
integrated task and motion planners (TAMP) that handle the complex interaction
between motion-level decisions and task-level plan feasibility. TAMP approaches
rely on domain-specific symbolic operators to guide the task-level search,
making planning efficient. In this work, we formalize and study the problem of
operator learning for TAMP. Central to this study is the view that operators
define a lossy abstraction of the transition model of a domain. We then propose
a bottom-up relational learning method for operator learning and show how the
learned operators can be used for planning in a TAMP system. Experimentally, we
provide results in three domains, including long-horizon robotic planning
tasks. We find our approach to substantially outperform several baselines,
including three graph neural network-based model-free approaches from the
recent literature. Video: https://youtu.be/iVfpX9BpBRo Code:
https://git.io/JCT0gComment: IROS 202
Generalized Planning as Heuristic Search: A new planning search-space that leverages pointers over objects
Planning as heuristic search is one of the most successful approaches to
classical planning but unfortunately, it does not extend trivially to
Generalized Planning (GP). GP aims to compute algorithmic solutions that are
valid for a set of classical planning instances from a given domain, even if
these instances differ in the number of objects, the number of state variables,
their domain size, or their initial and goal configuration. The generalization
requirements of GP make it impractical to perform the state-space search that
is usually implemented by heuristic planners. This paper adapts the planning as
heuristic search paradigm to the generalization requirements of GP, and
presents the first native heuristic search approach to GP. First, the paper
introduces a new pointer-based solution space for GP that is independent of the
number of classical planning instances in a GP problem and the size of those
instances (i.e. the number of objects, state variables and their domain sizes).
Second, the paper defines a set of evaluation and heuristic functions for
guiding a combinatorial search in our new GP solution space. The computation of
these evaluation and heuristic functions does not require grounding states or
actions in advance. Therefore our GP as heuristic search approach can handle
large sets of state variables with large numerical domains, e.g.~integers.
Lastly, the paper defines an upgraded version of our novel algorithm for GP
called Best-First Generalized Planning (BFGP), that implements a best-first
search in our pointer-based solution space, and that is guided by our
evaluation/heuristic functions for GP.Comment: Under review in the Artificial Intelligence Journal (AIJ
- …