7 research outputs found
Learning to Scaffold the Development of Robotic Manipulation Skills
Learning contact-rich, robotic manipulation skills is a challenging problem
due to the high-dimensionality of the state and action space as well as
uncertainty from noisy sensors and inaccurate motor control. To combat these
factors and achieve more robust manipulation, humans actively exploit contact
constraints in the environment. By adopting a similar strategy, robots can also
achieve more robust manipulation. In this paper, we enable a robot to
autonomously modify its environment and thereby discover how to ease
manipulation skill learning. Specifically, we provide the robot with fixtures
that it can freely place within the environment. These fixtures provide hard
constraints that limit the outcome of robot actions. Thereby, they funnel
uncertainty from perception and motor control and scaffold manipulation skill
learning. We propose a learning system that consists of two learning loops. In
the outer loop, the robot positions the fixture in the workspace. In the inner
loop, the robot learns a manipulation skill and after a fixed number of
episodes, returns the reward to the outer loop. Thereby, the robot is
incentivised to place the fixture such that the inner loop quickly achieves a
high reward. We demonstrate our framework both in simulation and in the real
world on three tasks: peg insertion, wrench manipulation and shallow-depth
insertion. We show that manipulation skill learning is dramatically sped up
through this way of scaffolding.Comment: Accepted to IEEE International Conference on Robotics and Automation
(ICRA) 202
STAP: Sequencing Task-Agnostic Policies
Advances in robotic skill acquisition have made it possible to build
general-purpose libraries of learned skills for downstream manipulation tasks.
However, naively executing these skills one after the other is unlikely to
succeed without accounting for dependencies between actions prevalent in
long-horizon plans. We present Sequencing Task-Agnostic Policies (STAP), a
scalable framework for training manipulation skills and coordinating their
geometric dependencies at planning time to solve long-horizon tasks never seen
by any skill during training. Given that Q-functions encode a measure of skill
feasibility, we formulate an optimization problem to maximize the joint success
of all skills sequenced in a plan, which we estimate by the product of their
Q-values. Our experiments indicate that this objective function approximates
ground truth plan feasibility and, when used as a planning objective, reduces
myopic behavior and thereby promotes long-horizon task success. We further
demonstrate how STAP can be used for task and motion planning by estimating the
geometric feasibility of skill sequences provided by a task planner. We
evaluate our approach in simulation and on a real robot. Qualitative results
and code are made available at https://sites.google.com/stanford.edu/stap/home
Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks
Solving real-world manipulation tasks requires robots to have a repertoire of
skills applicable to a wide range of circumstances. When using learning-based
methods to acquire such skills, the key challenge is to obtain training data
that covers diverse and feasible variations of the task, which often requires
non-trivial manual labor and domain knowledge. In this work, we introduce
Active Task Randomization (ATR), an approach that learns robust skills through
the unsupervised generation of training tasks. ATR selects suitable tasks,
which consist of an initial environment state and manipulation goal, for
learning robust skills by balancing the diversity and feasibility of the tasks.
We propose to predict task diversity and feasibility by jointly learning a
compact task representation. The selected tasks are then procedurally generated
in simulation using graph-based parameterization. The active selection of these
training tasks enables skill policies trained with our framework to robustly
handle a diverse range of objects and arrangements at test time. We demonstrate
that the learned skills can be composed by a task planner to solve unseen
sequential manipulation problems based on visual inputs. Compared to baseline
methods, ATR can achieve superior success rates in single-step and sequential
manipulation tasks.Comment: 9 pages, 5 figure
An Empirical Evaluation of Deep Learning on Highway Driving
Numerous groups have applied a variety of deep learning techniques to
computer vision problems in highway perception scenarios. In this paper, we
presented a number of empirical evaluations of recent deep learning advances.
Computer vision, combined with deep learning, has the potential to bring about
a relatively inexpensive, robust solution to autonomous driving. To prepare
deep learning for industry uptake and practical applications, neural networks
will require large data sets that represent all possible driving environments
and scenarios. We collect a large data set of highway data and apply deep
learning and computer vision algorithms to problems such as car and lane
detection. We show how existing convolutional neural networks (CNNs) can be
used to perform lane and vehicle detection while running at frame rates
required for a real-time system. Our results lend credence to the hypothesis
that deep learning holds promise for autonomous driving.Comment: Added a video for lane detectio
Text2Motion: From Natural Language Instructions to Feasible Plans
We propose Text2Motion, a language-based planning framework enabling robots
to solve sequential manipulation tasks that require long-horizon reasoning.
Given a natural language instruction, our framework constructs both a task- and
motion-level plan that is verified to reach inferred symbolic goals.
Text2Motion uses feasibility heuristics encoded in Q-functions of a library of
skills to guide task planning with Large Language Models. Whereas previous
language-based planners only consider the feasibility of individual skills,
Text2Motion actively resolves geometric dependencies spanning skill sequences
by performing geometric feasibility planning during its search. We evaluate our
method on a suite of problems that require long-horizon reasoning,
interpretation of abstract goals, and handling of partial affordance
perception. Our experiments show that Text2Motion can solve these challenging
problems with a success rate of 82%, while prior state-of-the-art
language-based planning methods only achieve 13%. Text2Motion thus provides
promising generalization characteristics to semantically diverse sequential
manipulation tasks with geometric dependencies between skills.Comment: https://sites.google.com/stanford.edu/text2motio