4 research outputs found
Embodied Active Learning of Relational State Abstractions for Bilevel Planning
State abstraction is an effective technique for planning in robotics
environments with continuous states and actions, long task horizons, and sparse
feedback. In object-oriented environments, predicates are a particularly useful
form of state abstraction because of their compatibility with symbolic planners
and their capacity for relational generalization. However, to plan with
predicates, the agent must be able to interpret them in continuous environment
states (i.e., ground the symbols). Manually programming predicate
interpretations can be difficult, so we would instead like to learn them from
data. We propose an embodied active learning paradigm where the agent learns
predicate interpretations through online interaction with an expert. For
example, after taking actions in a block stacking environment, the agent may
ask the expert: "Is On(block1, block2) true?" From this experience, the agent
learns to plan: it learns neural predicate interpretations, symbolic planning
operators, and neural samplers that can be used for bilevel planning. During
exploration, the agent plans to learn: it uses its current models to select
actions towards generating informative expert queries. We learn predicate
interpretations as ensembles of neural networks and use their entropy to
measure the informativeness of potential queries. We evaluate this approach in
three robotic environments and find that it consistently outperforms six
baselines while exhibiting sample efficiency in two key metrics: number of
environment interactions, and number of queries to the expert. Code:
https://tinyurl.com/active-predicatesComment: Conference on Lifelong Learning Agents (CoLLAs) 202
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Building general-purpose robots that can operate seamlessly, in any
environment, with any object, and utilizing various skills to complete diverse
tasks has been a long-standing goal in Artificial Intelligence. Unfortunately,
however, most existing robotic systems have been constrained - having been
designed for specific tasks, trained on specific datasets, and deployed within
specific environments. These systems usually require extensively-labeled data,
rely on task-specific models, have numerous generalization issues when deployed
in real-world scenarios, and struggle to remain robust to distribution shifts.
Motivated by the impressive open-set performance and content generation
capabilities of web-scale, large-capacity pre-trained models (i.e., foundation
models) in research fields such as Natural Language Processing (NLP) and
Computer Vision (CV), we devote this survey to exploring (i) how these existing
foundation models from NLP and CV can be applied to the field of robotics, and
also exploring (ii) what a robotics-specific foundation model would look like.
We begin by providing an overview of what constitutes a conventional robotic
system and the fundamental barriers to making it universally applicable. Next,
we establish a taxonomy to discuss current work exploring ways to leverage
existing foundation models for robotics and develop ones catered to robotics.
Finally, we discuss key challenges and promising future directions in using
foundation models for enabling general-purpose robotic systems. We encourage
readers to view our living GitHub repository of resources, including papers
reviewed in this survey as well as related projects and repositories for
developing foundation models for robotics
Classical Planning in Deep Latent Space
Current domain-independent, classical planners require symbolic models of the
problem domain and instance as input, resulting in a knowledge acquisition
bottleneck. Meanwhile, although deep learning has achieved significant success
in many fields, the knowledge is encoded in a subsymbolic representation which
is incompatible with symbolic systems such as planners. We propose Latplan, an
unsupervised architecture combining deep learning and classical planning. Given
only an unlabeled set of image pairs showing a subset of transitions allowed in
the environment (training inputs), Latplan learns a complete propositional PDDL
action model of the environment. Later, when a pair of images representing the
initial and the goal states (planning inputs) is given, Latplan finds a plan to
the goal state in a symbolic latent space and returns a visualized plan
execution. We evaluate Latplan using image-based versions of 6 planning
domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of
LightsOut.Comment: Under review at Journal of Artificial Intelligence Research (JAIR