50 research outputs found
Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation
Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot
End-to-end Driving via Conditional Imitation Learning
Deep networks trained on demonstrations of human driving have learned to
follow roads and avoid obstacles. However, driving policies trained via
imitation learning cannot be controlled at test time. A vehicle trained
end-to-end to imitate an expert cannot be guided to take a specific turn at an
upcoming intersection. This limits the utility of such systems. We propose to
condition imitation learning on high-level command input. At test time, the
learned driving policy functions as a chauffeur that handles sensorimotor
coordination but continues to respond to navigational commands. We evaluate
different architectures for conditional imitation learning in vision-based
driving. We conduct experiments in realistic three-dimensional simulations of
urban driving and on a 1/5 scale robotic truck that is trained to drive in a
residential area. Both systems drive based on visual input yet remain
responsive to high-level navigational commands. The supplementary video can be
viewed at https://youtu.be/cFtnflNe5fMComment: Published at the International Conference on Robotics and Automation
(ICRA), 201