27,217 research outputs found

    Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

    Full text link
    Intrinsically motivated spontaneous exploration is a key enabler of autonomous lifelong learning in human children. It enables the discovery and acquisition of large repertoires of skills through self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present an algorithmic approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) to enable similar properties of autonomous or self-supervised learning in machines. The IMGEP algorithmic architecture relies on several principles: 1) self-generation of goals, generalized as fitness functions; 2) selection of goals based on intrinsic rewards; 3) exploration with incremental goal-parameterized policy search and exploitation of the gathered data with a batch learning algorithm; 4) systematic reuse of information acquired when targeting a goal for improving towards other goals. We present a particularly efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a population-based policy and an object-centered modularity in goals and mutations. We provide several implementations of this architecture and demonstrate their ability to automatically generate a learning curriculum within several experimental setups including a real humanoid robot that can explore multiple spaces of goals with several hundred continuous dimensions. While no particular target goal is provided to the system, this curriculum allows the discovery of skills that act as stepping stone for learning more complex skills, e.g. nested tool use. We show that learning diverse spaces of goals with intrinsic motivations is more efficient for learning complex skills than only trying to directly learn these complex skills

    Information driven self-organization of complex robotic behaviors

    Get PDF
    Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well.Comment: 29 pages, 12 figure

    Recognising the Clothing Categories from Free-Configuration Using Gaussian-Process-Based Interactive Perception

    Get PDF
    In this paper, we propose a Gaussian Process- based interactive perception approach for recognising highly- wrinkled clothes. We have integrated this recognition method within a clothes sorting pipeline for the pre-washing stage of an autonomous laundering process. Our approach differs from reported clothing manipulation approaches by allowing the robot to update its perception confidence via numerous interactions with the garments. The classifiers predominantly reported in clothing perception (e.g. SVM, Random Forest) studies do not provide true classification probabilities, due to their inherent structure. In contrast, probabilistic classifiers (of which the Gaussian Process is a popular example) are able to provide predictive probabilities. In our approach, we employ a multi-class Gaussian Process classification using the Laplace approximation for posterior inference and optimising hyper-parameters via marginal likelihood maximisation. Our experimental results show that our approach is able to recognise unknown garments from highly-occluded and wrinkled con- figurations and demonstrates a substantial improvement over non-interactive perception approaches

    Post-Westgate SWAT : C4ISTAR Architectural Framework for Autonomous Network Integrated Multifaceted Warfighting Solutions Version 1.0 : A Peer-Reviewed Monograph

    Full text link
    Police SWAT teams and Military Special Forces face mounting pressure and challenges from adversaries that can only be resolved by way of ever more sophisticated inputs into tactical operations. Lethal Autonomy provides constrained military/security forces with a viable option, but only if implementation has got proper empirically supported foundations. Autonomous weapon systems can be designed and developed to conduct ground, air and naval operations. This monograph offers some insights into the challenges of developing legal, reliable and ethical forms of autonomous weapons, that address the gap between Police or Law Enforcement and Military operations that is growing exponentially small. National adversaries are today in many instances hybrid threats, that manifest criminal and military traits, these often require deployment of hybrid-capability autonomous weapons imbued with the capability to taken on both Military and/or Security objectives. The Westgate Terrorist Attack of 21st September 2013 in the Westlands suburb of Nairobi, Kenya is a very clear manifestation of the hybrid combat scenario that required military response and police investigations against a fighting cell of the Somalia based globally networked Al Shabaab terrorist group.Comment: 52 pages, 6 Figures, over 40 references, reviewed by a reade

    Black-Box Data-efficient Policy Search for Robotics

    Get PDF
    The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017; Code at http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
    • …
    corecore