Search CORE

44,184 research outputs found

Constraint-aware learning of policies by demonstration

Author: Armesto Leopoldo
Erden Mustafa Suphi
Ivan Vladimir
Moura João
Sala Antonio
Vijayakumar Sethu
Publication venue: 'SAGE Publications'
Publication date: 01/12/2018
Field of study

[EN] Many practical tasks in robotic systems, such as cleaning windows, writing, or grasping, are inherently constrained. Learning policies subject to constraints is a challenging problem. In this paper, we propose a method of constraint-aware learning that solves the policy learning problem using redundant robots that execute a policy that is acting in the null space of a constraint. In particular, we are interested in generalizing learned null-space policies across constraints that were not known during the training. We split the combined problem of learning constraints and policies into two: first estimating the constraint, and then estimating a null-space policy using the remaining degrees of freedom. For a linear parametrization, we provide a closed-form solution of the problem. We also define a metric for comparing the similarity of estimated constraints, which is useful to pre-process the trajectories recorded in the demonstrations. We have validated our method by learning a wiping task from human demonstration on flat surfaces and reproducing it on an unknown curved surface using a force- or torque-based controller to achieve tool alignment. We show that, despite the differences between the training and validation scenarios, we learn a policy that still provides the desired wiping motion.The author(s) disclosed receipt of the following financial support for the research, auth/orship, and/or publication of this article: This work was supported by the Spanish Ministry of Economy and the European Union (grant number DPI2016-81002-R (AEI/FEDER, UE)), the European Union Horizon 2020, as part of the project Memory of Motion - MEMMO (project ID 780684), and the Engineering and Physical Sciences Research Council, UK, as part of the Robotics and AI hub in Future AI and Robotics for Space - FAIR-SPACE (grant number EP/R026092/1), and as part of the Centre for Doctoral Training in Robotics and Autonomous Systems at Heriot-Watt University and the University of Edinburgh (grant numbers EP/L016834/1 and EP/J015040/1)Armesto, L.; Moura, J.; Ivan, V.; Erden, MS.; Sala, A.; Vijayakumar, S. (2018). Constraint-aware learning of policies by demonstration. The International Journal of Robotics Research. 37(13-14):1673-1689. https://doi.org/10.1177/0278364918784354S167316893713-14Alissandrakis, A., Nehaniv, C. L., & Dautenhahn, K. (2007). Correspondence Mapping Induced State and Action Metrics for Robotic Imitation. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 37(2), 299-307. doi:10.1109/tsmcb.2006.886947Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469-483. doi:10.1016/j.robot.2008.10.024Armesto, L., Bosga, J., Ivan, V., & Vijayakumar, S. (2017). Efficient learning of constraints and generic null space policies. 2017 IEEE International Conference on Robotics and Automation (ICRA). doi:10.1109/icra.2017.7989181Armesto, L., Ivan, V., Moura, J., Sala, A., & Vijayakumar, S. (2017). Learning Constrained Generalizable Policies by Demonstration. Robotics: Science and Systems XIII. doi:10.15607/rss.2017.xiii.036Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Artificial Intelligence Review, 11(1/5), 75-113. doi:10.1023/a:1006511328852Baerlocher, P., & Boulic, R. (2004). An inverse kinematics architecture enforcing an arbitrary number of strict priority levels. The Visual Computer, 20(6), 402-417. doi:10.1007/s00371-004-0244-4Calinon, S. (2015). A tutorial on task-parameterized movement learning and retrieval. Intelligent Service Robotics, 9(1), 1-29. doi:10.1007/s11370-015-0187-9Calinon, S., & Billard, A. (2007). Incremental learning of gestures by imitation in a humanoid robot. Proceeding of the ACM/IEEE international conference on Human-robot interaction - HRI ’07. doi:10.1145/1228716.1228751Cruse, H., & Brüwer, M. (1987). The human arm as a redundant manipulator: The control of path and joint angles. Biological Cybernetics, 57(1-2), 137-144. doi:10.1007/bf00318723D’Souza, A., Vijayakumar, S., & Schaal, S. (s. f.). Learning inverse kinematics. Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180). doi:10.1109/iros.2001.973374Escande, A., Mansard, N., & Wieber, P.-B. (2014). Hierarchical quadratic programming: Fast online humanoid-robot motion generation. The International Journal of Robotics Research, 33(7), 1006-1028. doi:10.1177/0278364914521306Gams, A., Nemec, B., Ijspeert, A. J., & Ude, A. (2014). Coupling Movement Primitives: Interaction With the Environment and Bimanual Tasks. IEEE Transactions on Robotics, 30(4), 816-830. doi:10.1109/tro.2014.2304775Gienger, M., Janssen, H., & Goerick, C. (s. f.). Task-oriented whole body motion for humanoid robots. 5th IEEE-RAS International Conference on Humanoid Robots, 2005. doi:10.1109/ichr.2005.1573574Herzog, A., Rotella, N., Mason, S., Grimminger, F., Schaal, S., & Righetti, L. (2015). Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid. Autonomous Robots, 40(3), 473-491. doi:10.1007/s10514-015-9476-6Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. doi:10.1016/0893-6080(89)90020-8Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots, 27(2), 105-121. doi:10.1007/s10514-009-9129-8Hussein, M., Mohammed, Y., & Ali, S. A. (2015). Learning from Demonstration Using Variational Bayesian Inference. Lecture Notes in Computer Science, 371-381. doi:10.1007/978-3-319-19066-2_36Khatib, O., Sentis, L., & Park, J.-H. (s. f.). A Unified Framework for Whole-Body Humanoid Robot Control with Multiple Constraints and Contacts. European Robotics Symposium 2008, 303-312. doi:10.1007/978-3-540-78317-6_31Lin, H.-C., Howard, M., & Vijayakumar, S. (2015). Learning null space projections. 2015 IEEE International Conference on Robotics and Automation (ICRA). doi:10.1109/icra.2015.7139551Lin, H.-C., Ray, P., & Howard, M. (2017). Learning task constraints in operational space formulation. 2017 IEEE International Conference on Robotics and Automation (ICRA). doi:10.1109/icra.2017.7989039Mansard, N., & Chaumette, F. (2007). Task Sequencing for High-Level Sensor-Based Control. IEEE Transactions on Robotics, 23(1), 60-72. doi:10.1109/tro.2006.889487Moura, J., & Erden, M. S. (2017). Formulation of a Control and Path Planning Approach for a Cab front Cleaning Robot. Procedia CIRP, 59, 67-71. doi:10.1016/j.procir.2016.09.024Paraschos, A., Lioutikov, R., Peters, J., & Neumann, G. (2017). Probabilistic Prioritization of Movement Primitives. IEEE Robotics and Automation Letters, 2(4), 2294-2301. doi:10.1109/lra.2017.2725440Pastor, P., Righetti, L., Kalakrishnan, M., & Schaal, S. (2011). Online movement adaptation based on previous sensor experiences. 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. doi:10.1109/iros.2011.6095059Schaal, S., & Atkeson, C. G. (1998). Constructive Incremental Learning from Only Local Information. Neural Computation, 10(8), 2047-2084. doi:10.1162/089976698300016963Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431), 537-547. doi:10.1098/rstb.2002.1258Shiller, Z. (2015). Off-Line and On-Line Trajectory Planning. Mechanisms and Machine Science, 29-62. doi:10.1007/978-3-319-14705-5_2Siciliano B, Sciavicco L, Villani L, et al. (2009) Differential Kinematics and Statics. London: Springer, pp. 105–160.Sugiura, H., Gienger, M., Janssen, H., & Goerick, C. (2006). Real-Time Self Collision Avoidance for Humanoids by means of Nullspace Criteria and Task Intervals. 2006 6th IEEE-RAS International Conference on Humanoid Robots. doi:10.1109/ichr.2006.321331Towell, C., Howard, M., & Vijayakumar, S. (2010). Learning nullspace policies. 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. doi:10.1109/iros.2010.5650663Yoshikawa, T. (1985). Manipulability of Robotic Mechanisms. The International Journal of Robotics Research, 4(2), 3-9. doi:10.1177/027836498500400201Zhang, X.-D. (2017). Matrix Analysis and Applications. doi:10.1017/978110827758

Heriot Watt Pure

Edinburgh Research Explorer

RiuNet

Learning Singularity Avoidance

Author: Howard Matthew
Manavalan Jeevan
Publication venue
Publication date: 25/03/2019
Field of study

With the increase in complexity of robotic systems and the rise in non-expert users, it can be assumed that task constraints are not explicitly known. In tasks where avoiding singularity is critical to its success, this paper provides an approach, especially for non-expert users, for the system to learn the constraints contained in a set of demonstrations, such that they can be used to optimise an autonomous controller to avoid singularity, without having to explicitly know the task constraints. The proposed approach avoids singularity, and thereby unpredictable behaviour when carrying out a task, by maximising the learnt manipulability throughout the motion of the constrained system, and is not limited to kinematic systems. Its benefits are demonstrated through comparisons with other control policies which show that the constrained manipulability of a system learnt through demonstration can be used to avoid singularities in cases where these other policies would fail. In the absence of the systems manipulability subject to a tasks constraints, the proposed approach can be used instead to infer these with results showing errors less than 10^-5 in 3DOF simulated systems as well as 10^-2 using a 7DOF real world robotic system

arXiv.org e-Print Archive

Crossref

King's Research Portal

Safety-Aware Apprenticeship Learning

Author: A Solar-Lezama
H Hansson
M Kwiatkowska
M Kwiatkowska
M Kwiatkowska
R Bellman
R Wimmer
S Jha
S Junges
T Han
Y-J Kuo
Publication venue
Publication date: 28/04/2018
Field of study

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification (CAV) 201

arXiv.org e-Print Archive

Crossref

Learning Task Constraints from Demonstration for Hybrid Force/Position Control

Author: Conkey Adam
Hermans Tucker
Publication venue
Publication date: 29/06/2019
Field of study

We present a novel method for learning hybrid force/position control from demonstration. We learn a dynamic constraint frame aligned to the direction of desired force using Cartesian Dynamic Movement Primitives. In contrast to approaches that utilize a fixed constraint frame, our approach easily accommodates tasks with rapidly changing task constraints over time. We activate only one degree of freedom for force control at any given time, ensuring motion is always possible orthogonal to the direction of desired force. Since we utilize demonstrated forces to learn the constraint frame, we are able to compensate for forces not detected by methods that learn only from the demonstrated kinematic motion, such as frictional forces between the end-effector and the contact surface. We additionally propose novel extensions to the Dynamic Movement Primitive (DMP) framework that encourage robust transition from free-space motion to in-contact motion in spite of environment uncertainty. We incorporate force feedback and a dynamically shifting goal to reduce forces applied to the environment and retain stable contact while enabling force control. Our methods exhibit low impact forces on contact and low steady-state tracking error.Comment: Under revie

arXiv.org e-Print Archive

Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving

Author: Großjohann Simon
Homoceanu Silviu
James Vinit
Rosbach Sascha
Roth Stefan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2020
Field of study

Behavior and motion planning play an important role in automated driving. Traditionally, behavior planners instruct local motion planners with predefined behaviors. Due to the high scene complexity in urban environments, unpredictable situations may occur in which behavior planners fail to match predefined behavior templates. Recently, general-purpose planners have been introduced, combining behavior and local motion planning. These general-purpose planners allow behavior-aware motion planning given a single reward function. However, two challenges arise: First, this function has to map a complex feature space into rewards. Second, the reward function has to be manually tuned by an expert. Manually tuning this reward function becomes a tedious task. In this paper, we propose an approach that relies on human driving demonstrations to automatically tune reward functions. This study offers important insights into the driving style optimization of general-purpose planners with maximum entropy inverse reinforcement learning. We evaluate our approach based on the expected value difference between learned and demonstrated policies. Furthermore, we compare the similarity of human driven trajectories with optimal policies of our planner under learned and expert-tuned reward functions. Our experiments show that we are able to learn reward functions exceeding the level of manual expert tuning without prior domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote, minor correction in preliminarie

arXiv.org e-Print Archive

Crossref

Socially Aware Motion Planning with Deep Reinforcement Learning

Author: Chen Yu Fan
Everett Michael
How Jonathan P.
Liu Miao
Publication venue
Publication date: 21/03/2018
Field of study

For robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules (e.g., passing on the right). However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.Comment: 8 page

arXiv.org e-Print Archive

DSpace@MIT

Crossref

A General Large Neighborhood Search Framework for Solving Integer Programs

Author: Dilkina Bistra
Lanka Ravi
Song Jialin
Yue Yisong
Publication venue
Publication date: 29/03/2020
Field of study

This paper studies how to design abstractions of large-scale combinatorial optimization problems that can leverage existing state-of-the-art solvers in general purpose ways, and that are amenable to data-driven design. The goal is to arrive at new approaches that can reliably outperform existing solvers in wall-clock time. We focus on solving integer programs, and ground our approach in the large neighborhood search (LNS) paradigm, which iteratively chooses a subset of variables to optimize while leaving the remainder fixed. The appeal of LNS is that it can easily use any existing solver as a subroutine, and thus can inherit the benefits of carefully engineered heuristic approaches and their software implementations. We also show that one can learn a good neighborhood selector from training data. Through an extensive empirical validation, we demonstrate that our LNS framework can significantly outperform, in wall-clock time, compared to state-of-the-art commercial solvers such as Gurobi

Caltech Authors