63 research outputs found
Learning a Unified Control Policy for Safe Falling
Being able to fall safely is a necessary motor skill for humanoids performing
highly dynamic tasks, such as running and jumping. We propose a new method to
learn a policy that minimizes the maximal impulse during the fall. The
optimization solves for both a discrete contact planning problem and a
continuous optimal control problem. Once trained, the policy can compute the
optimal next contacting body part (e.g. left foot, right foot, or hands),
contact location and timing, and the required joint actuation. We represent the
policy as a mixture of actor-critic neural network, which consists of n control
policies and the corresponding value functions. Each pair of actor-critic is
associated with one of the n possible contacting body parts. During execution,
the policy corresponding to the highest value function will be executed while
the associated body part will be the next contact with the ground. With this
mixture of actor-critic architecture, the discrete contact sequence planning is
solved through the selection of the best critics while the continuous control
problem is solved by the optimization of actors. We show that our policy can
achieve comparable, sometimes even higher, rewards than a recursive search of
the action space using dynamic programming, while enjoying 50 to 400 times of
speed gain during online execution
Developing agile motor skills on virtual and real humanoids
Demonstrating strength and agility on virtual and real humanoids has been an important goal in computer graphics and robotics. However, developing physics- based controllers for various agile motor skills requires a tremendous amount of prior knowledge and manual labor due to complex mechanisms of the motor skills. The focus of the dissertation is to develop a set of computational tools to expedite the design process of physics-based controllers that can execute a variety of agile motor skills on virtual and real humanoids. Instead of designing directly controllers real humanoids, this dissertation takes an approach that develops appropriate theories and models in virtual simulation and systematically transfers the solutions to hardware systems.
The algorithms and frameworks in this dissertation span various topics from spe- cific physics-based controllers to general learning frameworks. We first present an online algorithm for controlling falling and landing motions of virtual characters. The proposed algorithm is effective and efficient enough to generate falling motions for a wide range of arbitrary initial conditions in real-time. Next, we present a robust falling strategy for real humanoids that can manage a wide range of perturbations by planning the optimal contact sequences. We then introduce an iterative learning framework to easily design various agile motions, which is inspired by human learn- ing techniques. The proposed framework is followed by novel algorithms to efficiently optimize control parameters for the target tasks, especially when they have many constraints or parameterized goals. Finally, we introduce an iterative approach for exporting simulation-optimized control policies to hardware of robots to reduce the
number of hardware experiments, that accompany expensive costs and labors.Ph.D
Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation
We present Success weighted by Completion Time (SCT), a new metric for
evaluating navigation performance for mobile robots. Several related works on
navigation have used Success weighted by Path Length (SPL) as the primary
method of evaluating the path an agent makes to a goal location, but SPL is
limited in its ability to properly evaluate agents with complex dynamics. In
contrast, SCT explicitly takes the agent's dynamics model into consideration,
and aims to accurately capture how well the agent has approximated the fastest
navigation behavior afforded by its dynamics. While several embodied navigation
works use point-turn dynamics, we focus on unicycle-cart dynamics for our
agent, which better exemplifies the dynamics model of popular mobile robotics
platforms (e.g., LoCoBot, TurtleBot, Fetch, etc.). We also present
RRT*-Unicycle, an algorithm for unicycle dynamics that estimates the fastest
collision-free path and completion time from a starting pose to a goal location
in an environment containing obstacles. We experiment with deep reinforcement
learning and reward shaping to train and compare the navigation performance of
agents with different dynamics models. In evaluating these agents, we show that
in contrast to SPL, SCT is able to capture the advantages in navigation speed a
unicycle model has over a simpler point-turn model of dynamics. Lastly, we show
that we can successfully deploy our trained models and algorithms outside of
simulation in the real world. We embody our agents in an real robot to navigate
an apartment, and show that they can generalize in a zero-shot manner
ARMP: Autoregressive Motion Planning for Quadruped Locomotion and Navigation in Complex Indoor Environments
Generating natural and physically feasible motions for legged robots has been
a challenging problem due to its complex dynamics. In this work, we introduce a
novel learning-based framework of autoregressive motion planner (ARMP) for
quadruped locomotion and navigation. Our method can generate motion plans with
an arbitrary length in an autoregressive fashion, unlike most offline
trajectory optimization algorithms for a fixed trajectory length. To this end,
we first construct the motion library by solving a dense set of trajectory
optimization problems for diverse scenarios and parameter settings. Then we
learn the motion manifold from the dataset in a supervised learning fashion. We
show that the proposed ARMP can generate physically plausible motions for
various tasks and situations. We also showcase that our method can be
successfully integrated with the recent robot navigation frameworks as a
low-level controller and unleash the full capability of legged robots for
complex indoor navigation.Comment: Submitted to IRO
PM-FSM: Policies Modulating Finite State Machine for Robust Quadrupedal Locomotion
Deep reinforcement learning (deep RL) has emerged as an effective tool for
developing controllers for legged robots. However, vanilla deep RL often
requires a tremendous amount of training samples and is not feasible for
achieving robust behaviors. Instead, researchers have investigated a novel
policy architecture by incorporating human experts' knowledge, such as Policies
Modulating Trajectory Generators (PMTG). This architecture builds a recurrent
control loop by combining a parametric trajectory generator (TG) and a feedback
policy network to achieve more robust behaviors. To take advantage of human
experts' knowledge but eliminate time-consuming interactive teaching,
researchers have investigated a novel architecture, Policies Modulating
Trajectory Generators (PMTG), which builds a recurrent control loop by
combining a parametric trajectory generator (TG) and a feedback policy network
to achieve more robust behaviors using intuitive prior knowledge. In this work,
we propose Policies Modulating Finite State Machine (PM-FSM) by replacing TGs
with contact-aware finite state machines (FSM), which offer more flexible
control of each leg. Compared with the TGs, FSMs offer high-level management on
each leg motion generator and enable a flexible state arrangement, which makes
the learned behavior less vulnerable to unseen perturbations or challenging
terrains. This invention offers an explicit notion of contact events to the
policy to negotiate unexpected perturbations. We demonstrated that the proposed
architecture could achieve more robust behaviors in various scenarios, such as
challenging terrains or external perturbations, on both simulated and real
robots. The supplemental video can be found at: https://youtu.be/78cboMqTkJQ
Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement
Humanoid robots are well suited for human habitats due to their morphological
similarity, but developing controllers for them is a challenging task that
involves multiple sub-problems, such as control, planning and perception. In
this paper, we introduce a method to simplify controller design by enabling
users to train and fine-tune robot control policies using natural language
commands. We first learn a neural network policy that generates behaviors given
a natural language command, such as "walk forward", by combining Large Language
Models (LLMs), motion retargeting, and motion imitation. Based on the
synthesized motion, we iteratively fine-tune by updating the text prompt and
querying LLMs to find the best checkpoint associated with the closest motion in
history. We validate our approach using a simulated Digit humanoid robot and
demonstrate learning of diverse motions, such as walking, hopping, and kicking,
without the burden of complex reward engineering. In addition, we show that our
iterative refinement enables us to learn 3x times faster than a naive
formulation that learns from scratch
Learning a Single Policy for Diverse Behaviors on a Quadrupedal Robot using Scalable Motion Imitation
Learning various motor skills for quadrupedal robots is a challenging problem
that requires careful design of task-specific mathematical models or reward
descriptions. In this work, we propose to learn a single capable policy using
deep reinforcement learning by imitating a large number of reference motions,
including walking, turning, pacing, jumping, sitting, and lying. On top of the
existing motion imitation framework, we first carefully design the observation
space, the action space, and the reward function to improve the scalability of
the learning as well as the robustness of the final policy. In addition, we
adopt a novel adaptive motion sampling (AMS) method, which maintains a balance
between successful and unsuccessful behaviors. This technique allows the
learning algorithm to focus on challenging motor skills and avoid catastrophic
forgetting. We demonstrate that the learned policy can exhibit diverse
behaviors in simulation by successfully tracking both the training dataset and
out-of-distribution trajectories. We also validate the importance of the
proposed learning formulation and the adaptive motion sampling scheme by
conducting experiments
- …