139 research outputs found
Data-Driven Model Predictive Control for Food-Cutting
Modelling of contact-rich tasks is challenging and cannot be entirely solved
using classical control approaches due to the difficulty of constructing an
analytic description of the contact dynamics. Additionally, in a manipulation
task like food-cutting, purely learning-based methods such as Reinforcement
Learning, require either a vast amount of data that is expensive to collect on
a real robot, or a highly realistic simulation environment, which is currently
not available. This paper presents a data-driven control approach that employs
a recurrent neural network to model the dynamics for a Model Predictive
Controller. We build upon earlier work limited to torque-controlled robots and
redefine it for velocity controlled ones. We incorporate force/torque sensor
measurements, reformulate and further extend the control problem formulation.
We evaluate the performance on objects used for training, as well as on unknown
objects, by means of the cutting rates achieved and demonstrate that the method
can efficiently treat different cases with only one dynamic model. Finally we
investigate the behavior of the system during force-critical instances of
cutting and illustrate its adaptive behavior in difficult cases
Model Based Reinforcement Learning with Final Time Horizon Optimization
We present one of the first algorithms on model based reinforcement learning
and trajectory optimization with free final time horizon. Grounded on the
optimal control theory and Dynamic Programming, we derive a set of backward
differential equations that propagate the value function and provide the
optimal control policy and the optimal time horizon. The resulting policy
generalizes previous results in model based trajectory optimization. Our
analysis shows that the proposed algorithm recovers the theoretical optimal
solution on linear low dimensional problem. Finally we provide application
results on nonlinear systems.Comment: 9 pages, 5 figures, NIPS201
Cascade Attribute Learning Network
We propose the cascade attribute learning network (CALNet), which can learn
attributes in a control task separately and assemble them together. Our
contribution is twofold: first we propose attribute learning in reinforcement
learning (RL). Attributes used to be modeled using constraint functions or
terms in the objective function, making it hard to transfer. Attribute
learning, on the other hand, models these task properties as modules in the
policy network. We also propose using novel cascading compensative networks in
the CALNet to learn and assemble attributes. Using the CALNet, one can zero
shoot an unseen task by separately learning all its attributes, and assembling
the attribute modules. We have validated the capacity of our model on a wide
variety of control problems with attributes in time, position, velocity and
acceleration phases
Nonsmooth optimal value and policy functions in mechanical systems subject to unilateral constraints
State-of-the-art approaches to optimal control use smooth approximations of
value and policy functions and gradient-based algorithms for improving
approximator parameters. Unfortunately, we show that value and policy functions
that arise in optimal control of mechanical systems subject to unilateral
constraints -- i.e. the contact-rich dynamics of robot locomotion and
manipulation -- are generally nonsmooth due to the underlying dynamics
exhibiting discontinuous or piecewise-differentiable trajectory outcomes.
Simple mechanical systems are used to illustrate this result and the
implications for optimal control of contact-rich robot dynamics.Comment: Submitted to IEEE CS
Sample Efficient Path Integral Control under Uncertainty
We present a data-driven optimal control framework that can be viewed as a
generalization of the path integral (PI) control approach. We find iterative
feedback control laws without parameterization based on probabilistic
representation of learned dynamics model. The proposed algorithm operates in a
forward-backward manner which differentiate from other PI-related methods that
perform forward sampling to find optimal controls. Our method uses
significantly less samples to find optimal controls compared to other
approaches within the PI control family that relies on extensive sampling from
given dynamics models or trials on physical systems in model-free fashions. In
addition, the learned controllers can be generalized to new tasks without
re-sampling based on the compositionality theory for the linearly-solvable
optimal control framework. We provide experimental results on three different
systems and comparisons with state-of-the-art model-based methods to
demonstrate the efficiency and generalizability of the proposed framework
A Fog Robotic System for Dynamic Visual Servoing
Cloud Robotics is a paradigm where distributed robots are connected to cloud
services via networks to access unlimited computation power, at the cost of
network communication. However, due to limitations such as network latency and
variability, it is difficult to control dynamic, human compliant service robots
directly from the cloud. In this work, by leveraging asynchronous protocol with
a heartbeat signal, we combine cloud robotics with a smart edge device to build
a Fog Robotic system. We use the system to enable robust teleoperation of a
dynamic self-balancing robot from the cloud. We first use the system to pick up
boxes from static locations, a task commonly performed in warehouse logistics.
To make cloud teleoperation more efficient, we deploy image based visual
servoing (IBVS) to perform box pickups automatically. Visual feedbacks,
including apriltag recognition and tracking, are performed in the cloud to
emulate a Fog Robotic object recognition system for IBVS. We demonstrate the
feasibility of real-time dynamic automation system using this cloud-edge
hybrid, which opens up possibilities of deploying dynamic robotic control with
deep-learning recognition systems in Fog Robotics. Finally, we show that Fog
Robotics enables the self-balancing service robot to pick up a box
automatically from a person under unstructured environments.Comment: 7 pages, 5 figures, ICRA 2019 (submitted, under review
Learning to Optimize
Algorithm design is a laborious process and often requires many iterations of
ideation and validation. In this paper, we explore automating algorithm design
and present a method to learn an optimization algorithm, which we believe to be
the first method that can automatically discover a better algorithm. We
approach this problem from a reinforcement learning perspective and represent
any particular optimization algorithm as a policy. We learn an optimization
algorithm using guided policy search and demonstrate that the resulting
algorithm outperforms existing hand-engineered algorithms in terms of
convergence speed and/or the final objective value.Comment: 9 pages, 3 figure
Adaptive Tensegrity Locomotion on Rough Terrain via Reinforcement Learning
The dynamical properties of tensegrity robots give them appealing ruggedness
and adaptability, but present major challenges with respect to locomotion
control. Due to high-dimensionality and complex contact responses, data-driven
approaches are apt for producing viable feedback policies. Guided Policy Search
(GPS), a sample-efficient and model-free hybrid framework for optimization and
reinforcement learning, has recently been used to produce periodic locomotion
for a spherical 6-bar tensegrity robot on flat or slightly varied surfaces.
This work provides an extension to non-periodic locomotion and achieves rough
terrain traversal, which requires more broadly varied, adaptive, and
non-periodic rover behavior. The contribution alters the control optimization
step of GPS, which locally fits and exploits surrogate models of the dynamics,
and employs the existing supervised learning step. The proposed solution
incorporates new processes to ensure effective local modeling despite the
disorganized nature of sample data in rough terrain locomotion. Demonstrations
in simulation reveal that the resulting controller sustains the highly adaptive
behavior necessary to reliably traverse rough terrain.Comment: submitted to ICRA 201
Sample-Efficient Policy Learning based on Completely Behavior Cloning
Direct policy search is one of the most important algorithm of reinforcement
learning. However, learning from scratch needs a large amount of experience
data and can be easily prone to poor local optima. In addition to that, a
partially trained policy tends to perform dangerous action to agent and
environment. In order to overcome these challenges, this paper proposed a
policy initialization algorithm called Policy Learning based on Completely
Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control
(MPC) controller into a piecewise affine (PWA) function using multi-parametric
programming, and uses a neural network to express this function. By this way,
PLCBC can completely clone the MPC controller without any performance loss, and
is totally training-free. The experiments show that this initialization
strategy can help agent learn at the high reward state region, and converge
faster and better
Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning
We present an algorithm for rapidly learning controllers for robotics
systems. The algorithm follows the model-based reinforcement learning paradigm,
and improves upon existing algorithms; namely Probabilistic learning in Control
(PILCO) and a sample-based version of PILCO with neural network dynamics
(Deep-PILCO). We propose training a neural network dynamics model using
variational dropout with truncated Log-Normal noise. This allows us to obtain a
dynamics model with calibrated uncertainty, which can be used to simulate
controller executions via rollouts. We also describe set of techniques,
inspired by viewing PILCO as a recurrent neural network model, that are crucial
to improve the convergence of the method. We test our method on a variety of
benchmark tasks, demonstrating data-efficiency that is competitive with PILCO,
while being able to optimize complex neural network controllers. Finally, we
assess the performance of the algorithm for learning motor controllers for a
six legged autonomous underwater vehicle. This demonstrates the potential of
the algorithm for scaling up the dimensionality and dataset sizes, in more
complex control tasks.Comment: 8 pages, 7 figure
- …