44,184 research outputs found
Constraint-aware learning of policies by demonstration
[EN] Many practical tasks in robotic systems, such as cleaning windows, writing, or grasping, are inherently constrained. Learning policies subject to constraints is a challenging problem. In this paper, we propose a method of constraint-aware learning that solves the policy learning problem using redundant robots that execute a policy that is acting in the null space of a constraint. In particular, we are interested in generalizing learned null-space policies across constraints that were not known during the training. We split the combined problem of learning constraints and policies into two: first estimating the constraint, and then estimating a null-space policy using the remaining degrees of freedom. For a linear parametrization, we provide a closed-form solution of the problem. We also define a metric for comparing the similarity of estimated constraints, which is useful to pre-process the trajectories recorded in the demonstrations. We have validated our method by learning a wiping task from human demonstration on flat surfaces and reproducing it on an unknown curved surface using a force- or torque-based controller to achieve tool alignment. We show that, despite the differences between the training and validation scenarios, we learn a policy that still provides the desired wiping motion.The author(s) disclosed receipt of the following financial support for the research, auth/orship, and/or publication of this article: This work was supported by the Spanish Ministry of Economy and the European Union (grant number DPI2016-81002-R (AEI/FEDER, UE)), the European Union Horizon 2020, as part of the project Memory of Motion - MEMMO (project ID 780684), and the Engineering and Physical Sciences Research Council, UK, as part of the Robotics and AI hub in Future AI and Robotics for Space - FAIR-SPACE (grant number EP/R026092/1), and as part of the Centre for Doctoral Training in Robotics and Autonomous Systems at Heriot-Watt University and the University of Edinburgh (grant numbers EP/L016834/1 and EP/J015040/1)Armesto, L.; Moura, J.; Ivan, V.; Erden, MS.; Sala, A.; Vijayakumar, S. (2018). Constraint-aware learning of policies by demonstration. The International Journal of Robotics Research. 37(13-14):1673-1689. https://doi.org/10.1177/0278364918784354S167316893713-14Alissandrakis, A., Nehaniv, C. L., & Dautenhahn, K. (2007). Correspondence Mapping Induced State and Action Metrics for Robotic Imitation. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 37(2), 299-307. doi:10.1109/tsmcb.2006.886947Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469-483. doi:10.1016/j.robot.2008.10.024Armesto, L., Bosga, J., Ivan, V., & Vijayakumar, S. (2017). Efficient learning of constraints and generic null space policies. 2017 IEEE International Conference on Robotics and Automation (ICRA). doi:10.1109/icra.2017.7989181Armesto, L., Ivan, V., Moura, J., Sala, A., & Vijayakumar, S. (2017). Learning Constrained Generalizable Policies by Demonstration. Robotics: Science and Systems XIII. doi:10.15607/rss.2017.xiii.036Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Artificial Intelligence Review, 11(1/5), 75-113. doi:10.1023/a:1006511328852Baerlocher, P., & Boulic, R. (2004). An inverse kinematics architecture enforcing an arbitrary number of strict priority levels. The Visual Computer, 20(6), 402-417. doi:10.1007/s00371-004-0244-4Calinon, S. (2015). A tutorial on task-parameterized movement learning and retrieval. Intelligent Service Robotics, 9(1), 1-29. doi:10.1007/s11370-015-0187-9Calinon, S., & Billard, A. (2007). Incremental learning of gestures by imitation in a humanoid robot. Proceeding of the ACM/IEEE international conference on Human-robot interaction - HRI ’07. doi:10.1145/1228716.1228751Cruse, H., & Brüwer, M. (1987). The human arm as a redundant manipulator: The control of path and joint angles. Biological Cybernetics, 57(1-2), 137-144. doi:10.1007/bf00318723D’Souza, A., Vijayakumar, S., & Schaal, S. (s. f.). Learning inverse kinematics. Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180). doi:10.1109/iros.2001.973374Escande, A., Mansard, N., & Wieber, P.-B. (2014). Hierarchical quadratic programming: Fast online humanoid-robot motion generation. The International Journal of Robotics Research, 33(7), 1006-1028. doi:10.1177/0278364914521306Gams, A., Nemec, B., Ijspeert, A. J., & Ude, A. (2014). Coupling Movement Primitives: Interaction With the Environment and Bimanual Tasks. IEEE Transactions on Robotics, 30(4), 816-830. doi:10.1109/tro.2014.2304775Gienger, M., Janssen, H., & Goerick, C. (s. f.). Task-oriented whole body motion for humanoid robots. 5th IEEE-RAS International Conference on Humanoid Robots, 2005. doi:10.1109/ichr.2005.1573574Herzog, A., Rotella, N., Mason, S., Grimminger, F., Schaal, S., & Righetti, L. (2015). Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid. Autonomous Robots, 40(3), 473-491. doi:10.1007/s10514-015-9476-6Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. doi:10.1016/0893-6080(89)90020-8Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots, 27(2), 105-121. doi:10.1007/s10514-009-9129-8Hussein, M., Mohammed, Y., & Ali, S. A. (2015). Learning from Demonstration Using Variational Bayesian Inference. Lecture Notes in Computer Science, 371-381. doi:10.1007/978-3-319-19066-2_36Khatib, O., Sentis, L., & Park, J.-H. (s. f.). A Unified Framework for Whole-Body Humanoid Robot Control with Multiple Constraints and Contacts. European Robotics Symposium 2008, 303-312. doi:10.1007/978-3-540-78317-6_31Lin, H.-C., Howard, M., & Vijayakumar, S. (2015). Learning null space projections. 2015 IEEE International Conference on Robotics and Automation (ICRA). doi:10.1109/icra.2015.7139551Lin, H.-C., Ray, P., & Howard, M. (2017). Learning task constraints in operational space formulation. 2017 IEEE International Conference on Robotics and Automation (ICRA). doi:10.1109/icra.2017.7989039Mansard, N., & Chaumette, F. (2007). Task Sequencing for High-Level Sensor-Based Control. IEEE Transactions on Robotics, 23(1), 60-72. doi:10.1109/tro.2006.889487Moura, J., & Erden, M. S. (2017). Formulation of a Control and Path Planning Approach for a Cab front Cleaning Robot. Procedia CIRP, 59, 67-71. doi:10.1016/j.procir.2016.09.024Paraschos, A., Lioutikov, R., Peters, J., & Neumann, G. (2017). Probabilistic Prioritization of Movement Primitives. IEEE Robotics and Automation Letters, 2(4), 2294-2301. doi:10.1109/lra.2017.2725440Pastor, P., Righetti, L., Kalakrishnan, M., & Schaal, S. (2011). Online movement adaptation based on previous sensor experiences. 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. doi:10.1109/iros.2011.6095059Schaal, S., & Atkeson, C. G. (1998). Constructive Incremental Learning from Only Local Information. Neural Computation, 10(8), 2047-2084. doi:10.1162/089976698300016963Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431), 537-547. doi:10.1098/rstb.2002.1258Shiller, Z. (2015). Off-Line and On-Line Trajectory Planning. Mechanisms and Machine Science, 29-62. doi:10.1007/978-3-319-14705-5_2Siciliano B, Sciavicco L, Villani L, et al. (2009) Differential Kinematics and Statics. London: Springer, pp. 105–160.Sugiura, H., Gienger, M., Janssen, H., & Goerick, C. (2006). Real-Time Self Collision Avoidance for Humanoids by means of Nullspace Criteria and Task Intervals. 2006 6th IEEE-RAS International Conference on Humanoid Robots. doi:10.1109/ichr.2006.321331Towell, C., Howard, M., & Vijayakumar, S. (2010). Learning nullspace policies. 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. doi:10.1109/iros.2010.5650663Yoshikawa, T. (1985). Manipulability of Robotic Mechanisms. The International Journal of Robotics Research, 4(2), 3-9. doi:10.1177/027836498500400201Zhang, X.-D. (2017). Matrix Analysis and Applications. doi:10.1017/978110827758
Learning Singularity Avoidance
With the increase in complexity of robotic systems and the rise in non-expert
users, it can be assumed that task constraints are not explicitly known. In
tasks where avoiding singularity is critical to its success, this paper
provides an approach, especially for non-expert users, for the system to learn
the constraints contained in a set of demonstrations, such that they can be
used to optimise an autonomous controller to avoid singularity, without having
to explicitly know the task constraints. The proposed approach avoids
singularity, and thereby unpredictable behaviour when carrying out a task, by
maximising the learnt manipulability throughout the motion of the constrained
system, and is not limited to kinematic systems. Its benefits are demonstrated
through comparisons with other control policies which show that the constrained
manipulability of a system learnt through demonstration can be used to avoid
singularities in cases where these other policies would fail. In the absence of
the systems manipulability subject to a tasks constraints, the proposed
approach can be used instead to infer these with results showing errors less
than 10^-5 in 3DOF simulated systems as well as 10^-2 using a 7DOF real world
robotic system
Safety-Aware Apprenticeship Learning
Apprenticeship learning (AL) is a kind of Learning from Demonstration
techniques where the reward function of a Markov Decision Process (MDP) is
unknown to the learning agent and the agent has to derive a good policy by
observing an expert's demonstrations. In this paper, we study the problem of
how to make AL algorithms inherently safe while still meeting its learning
objective. We consider a setting where the unknown reward function is assumed
to be a linear combination of a set of state features, and the safety property
is specified in Probabilistic Computation Tree Logic (PCTL). By embedding
probabilistic model checking inside AL, we propose a novel
counterexample-guided approach that can ensure safety while retaining
performance of the learnt policy. We demonstrate the effectiveness of our
approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification
(CAV) 201
Learning Task Constraints from Demonstration for Hybrid Force/Position Control
We present a novel method for learning hybrid force/position control from
demonstration. We learn a dynamic constraint frame aligned to the direction of
desired force using Cartesian Dynamic Movement Primitives. In contrast to
approaches that utilize a fixed constraint frame, our approach easily
accommodates tasks with rapidly changing task constraints over time. We
activate only one degree of freedom for force control at any given time,
ensuring motion is always possible orthogonal to the direction of desired
force. Since we utilize demonstrated forces to learn the constraint frame, we
are able to compensate for forces not detected by methods that learn only from
the demonstrated kinematic motion, such as frictional forces between the
end-effector and the contact surface. We additionally propose novel extensions
to the Dynamic Movement Primitive (DMP) framework that encourage robust
transition from free-space motion to in-contact motion in spite of environment
uncertainty. We incorporate force feedback and a dynamically shifting goal to
reduce forces applied to the environment and retain stable contact while
enabling force control. Our methods exhibit low impact forces on contact and
low steady-state tracking error.Comment: Under revie
Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving
Behavior and motion planning play an important role in automated driving.
Traditionally, behavior planners instruct local motion planners with predefined
behaviors. Due to the high scene complexity in urban environments,
unpredictable situations may occur in which behavior planners fail to match
predefined behavior templates. Recently, general-purpose planners have been
introduced, combining behavior and local motion planning. These general-purpose
planners allow behavior-aware motion planning given a single reward function.
However, two challenges arise: First, this function has to map a complex
feature space into rewards. Second, the reward function has to be manually
tuned by an expert. Manually tuning this reward function becomes a tedious
task. In this paper, we propose an approach that relies on human driving
demonstrations to automatically tune reward functions. This study offers
important insights into the driving style optimization of general-purpose
planners with maximum entropy inverse reinforcement learning. We evaluate our
approach based on the expected value difference between learned and
demonstrated policies. Furthermore, we compare the similarity of human driven
trajectories with optimal policies of our planner under learned and
expert-tuned reward functions. Our experiments show that we are able to learn
reward functions exceeding the level of manual expert tuning without prior
domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote,
minor correction in preliminarie
Socially Aware Motion Planning with Deep Reinforcement Learning
For robotic vehicles to navigate safely and efficiently in pedestrian-rich
environments, it is important to model subtle human behaviors and navigation
rules (e.g., passing on the right). However, while instinctive to humans,
socially compliant navigation is still difficult to quantify due to the
stochasticity in people's behaviors. Existing works are mostly focused on using
feature-matching techniques to describe and imitate human paths, but often do
not generalize well since the feature values can vary from person to person,
and even run to run. This work notes that while it is challenging to directly
specify the details of what to do (precise mechanisms of human navigation), it
is straightforward to specify what not to do (violations of social norms).
Specifically, using deep reinforcement learning, this work develops a
time-efficient navigation policy that respects common social norms. The
proposed method is shown to enable fully autonomous navigation of a robotic
vehicle moving at human walking speed in an environment with many pedestrians.Comment: 8 page
A General Large Neighborhood Search Framework for Solving Integer Programs
This paper studies how to design abstractions of large-scale combinatorial optimization problems that can leverage existing state-of-the-art solvers in general purpose ways, and that are amenable to data-driven design. The goal is to arrive at new approaches that can reliably outperform existing solvers in wall-clock time. We focus on solving integer programs, and ground our approach in the large neighborhood search (LNS) paradigm, which iteratively chooses a subset of variables to optimize while leaving the remainder fixed. The appeal of LNS is that it can easily use any existing solver as a subroutine, and thus can inherit the benefits of carefully engineered heuristic approaches and their software implementations. We also show that one can learn a good neighborhood selector from training data. Through an extensive empirical validation, we demonstrate that our LNS framework can significantly outperform, in wall-clock time, compared to state-of-the-art commercial solvers such as Gurobi
- …