309 research outputs found
Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots
Millirobots are a promising robotic platform for many applications due to
their small size and low manufacturing costs. Legged millirobots, in
particular, can provide increased mobility in complex environments and improved
scaling of obstacles. However, controlling these small, highly dynamic, and
underactuated legged systems is difficult. Hand-engineered controllers can
sometimes control these legged millirobots, but they have difficulties with
dynamic maneuvers and complex terrains. We present an approach for controlling
a real-world legged millirobot that is based on learned neural network models.
Using less than 17 minutes of data, our method can learn a predictive model of
the robot's dynamics that can enable effective gaits to be synthesized on the
fly for following user-specified waypoints on a given terrain. Furthermore, by
leveraging expressive, high-capacity neural network models, our approach allows
for these predictions to be directly conditioned on camera images, endowing the
robot with the ability to predict how different terrains might affect its
dynamics. This enables sample-efficient and effective learning for locomotion
of a dynamic legged millirobot on various terrains, including gravel, turf,
carpet, and styrofoam. Experiment videos can be found at
https://sites.google.com/view/imageconddy
Learning directed locomotion in modular robots with evolvable morphologies
The vision behind this paper looks ahead to evolutionary robot systems where morphologies and controllers are evolved together and ‘newborn’ robots undergo a learning process to optimize their inherited brain for the inherited body. The specific problem we address is learning controllers for the task of directed locomotion in evolvable modular robots. To this end, we present a test suite of robots with different shapes and sizes and compare two learning algorithms, Bayesian optimization and HyperNEAT. The experiments in simulation show that both methods obtain good controllers, but Bayesian optimization is more effective and sample efficient. We validate the best learned controllers by constructing three robots from the test suite in the real world and observe their fitness and actual trajectories. The obtained results indicate a reality gap, but overall the trajectories are adequate and follow the target directions successfully
Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion
Learning policies for bipedal locomotion can be difficult, as experiments are
expensive and simulation does not usually transfer well to hardware. To counter
this, we need al- gorithms that are sample efficient and inherently safe.
Bayesian Optimization is a powerful sample-efficient tool for optimizing
non-convex black-box functions. However, its performance can degrade in higher
dimensions. We develop a distance metric for bipedal locomotion that enhances
the sample-efficiency of Bayesian Optimization and use it to train a 16
dimensional neuromuscular model for planar walking. This distance metric
reflects some basic gait features of healthy walking and helps us quickly
eliminate a majority of unstable controllers. With our approach we can learn
policies for walking in less than 100 trials for a range of challenging
settings. In simulation, we show results on two different costs and on various
terrains including rough ground and ramps, sloping upwards and downwards. We
also perturb our models with unknown inertial disturbances analogous with
differences between simulation and hardware. These results are promising, as
they indicate that this method can potentially be used to learn control
policies on hardware.Comment: To appear in International Conference on Humanoid Robots (Humanoids
'2016), IEEE-RAS. (Rika Antonova and Akshara Rai contributed equally
Robot Learning with Crash Constraints
In the past decade, numerous machine learning algorithms have been shown to
successfully learn optimal policies to control real robotic systems. However,
it is common to encounter failing behaviors as the learning loop progresses.
Specifically, in robot applications where failing is undesired but not
catastrophic, many algorithms struggle with leveraging data obtained from
failures. This is usually caused by (i) the failed experiment ending
prematurely, or (ii) the acquired data being scarce or corrupted. Both
complicate the design of proper reward functions to penalize failures. In this
paper, we propose a framework that addresses those issues. We consider failing
behaviors as those that violate a constraint and address the problem of
learning with crash constraints, where no data is obtained upon constraint
violation. The no-data case is addressed by a novel GP model (GPCR) for the
constraint that combines discrete events (failure/success) with continuous
observations (only obtained upon success). We demonstrate the effectiveness of
our framework on simulated benchmarks and on a real jumping quadruped, where
the constraint threshold is unknown a priori. Experimental data is collected,
by means of constrained Bayesian optimization, directly on the real robot. Our
results outperform manual tuning and GPCR proves useful on estimating the
constraint threshold.Comment: 8 pages, 4 figures, 1 table, 1 algorithm. Accepted for publication in
IEEE Robotics and Automation Letters (RA-L). Video demonstration of the
experiments available at https://youtu.be/RAiIo0l6_rE . Algorithm
implementation available at
https://github.com/alonrot/classified_regression.gi
High-Dimensional Controller Tuning through Latent Representations
In this paper, we propose a method to automatically and efficiently tune
high-dimensional vectors of controller parameters. The proposed method first
learns a mapping from the high-dimensional controller parameter space to a
lower dimensional space using a machine learning-based algorithm. This mapping
is then utilized in an actor-critic framework using Bayesian optimization (BO).
The proposed approach is applicable to complex systems (such as quadruped
robots). In addition, the proposed approach also enables efficient
generalization to different control tasks while also reducing the number of
evaluations required while tuning the controller parameters. We evaluate our
method on a legged locomotion application. We show the efficacy of the
algorithm in tuning the high-dimensional controller parameters and also
reducing the number of evaluations required for the tuning. Moreover, it is
shown that the method is successful in generalizing to new tasks and is also
transferable to other robot dynamics
- …