Autonomous learning of object manipulation skills can enable robots to
acquire rich behavioral repertoires that scale to the variety of objects found
in the real world. However, current motion skill learning methods typically
restrict the behavior to a compact, low-dimensional representation, limiting
its expressiveness and generality. In this paper, we extend a recently
developed policy search method \cite{la-lnnpg-14} and use it to learn a range
of dynamic manipulation behaviors with highly general policy representations,
without using known models or example demonstrations. Our approach learns a set
of trajectories for the desired motion skill by using iteratively refitted
time-varying linear models, and then unifies these trajectories into a single
control policy that can generalize to new situations. To enable this method to
run on a real robot, we introduce several improvements that reduce the sample
count and automate parameter selection. We show that our method can acquire
fast, fluent behaviors after only minutes of interaction time, and can learn
robust controllers for complex tasks, including putting together a toy
airplane, stacking tight-fitting lego blocks, placing wooden rings onto
tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps
onto bottles