41 research outputs found

    Effizientes und stabiles online Lernen für "Developmental Robots"

    Get PDF
    Recent progress in robotics and cognitive science has inspired a new generation of more versatile robots, so-called developmental robots. Many learning approaches for these robots are inspired by developmental processes and learning mechanisms observed in children. It is widely accepted that developmental robots must autonomously develop, acquire their skills, and cope with unforeseen challenges in unbounded environments through lifelong learning. Continuous online adaptation and intrinsically motivated learning are thus essential capabilities for these robots. However, the high sample-complexity of online learning and intrinsic motivation methods impedes the efficiency and practical feasibility of these methods for lifelong learning. Consequently, the majority of previous work has been demonstrated only in simulation. This thesis devises new methods and learning schemes to mitigate this problem and to permit direct online training on physical robots. A novel intrinsic motivation method is developed to drive the robot’s exploration to efficiently select what to learn. This method combines new knowledge-based and competence-based signals to increase sample-efficiency and to enable lifelong learning. While developmental robots typically acquire their skills through self-exploration, their autonomous development could be accelerated by additionally learning from humans. Yet there is hardly any research to integrate intrinsic motivation with learning from a teacher. The thesis therefore establishes a new learning scheme to integrate intrinsic motivation with learning from observation. The underlying exploration mechanism in the proposed learning schemes relies on Goal Babbling as a goal-directed method for learning direct inverse robot models online, from scratch, and in a learning while behaving fashion. Online learning of multiple solutions for redundant robots with this framework was missing. This thesis devises an incremental online associative network to enable simultaneous exploration and solution consolidation and establishes a new technique to stabilize the learning system. The proposed methods and learning schemes are demonstrated for acquiring reaching skills. Their efficiency, stability, and applicability are benchmarked in simulation and demonstrated on a physical 7-DoF Baxter robot arm.Jüngste Entwicklungen in der Robotik und den Kognitionswissenschaften haben zu einer Generation von vielseitigen Robotern geführt, die als ”Developmental Robots” bezeichnet werden. Lernverfahren für diese Roboter sind inspiriert von Lernmechanismen, die bei Kindern beobachtet wurden. ”Developmental Robots” müssen autonom Fertigkeiten erwerben und unvorhergesehene Herausforderungen in uneingeschränkten Umgebungen durch lebenslanges Lernen meistern. Kontinuierliches Anpassen und Lernen durch intrinsische Motivation sind daher wichtige Eigenschaften. Allerdings schränkt der hohe Aufwand beim Generieren von Datenpunkten die praktische Nutzbarkeit solcher Verfahren ein. Daher wurde ein Großteil nur in Simulationen demonstriert. In dieser Arbeit werden daher neue Methoden konzipiert, um dieses Problem zu meistern und ein direktes Online-Training auf realen Robotern zu ermöglichen. Dazu wird eine neue intrinsisch motivierte Methode entwickelt, die während der Umgebungsexploration effizient auswählt, was gelernt wird. Sie kombiniert neue wissens- und kompetenzbasierte Signale, um die Sampling-Effizienz zu steigern und lebenslanges Lernen zu ermöglichen. Während ”Developmental Robots” Fertigkeiten durch Selbstexploration erwerben, kann ihre Entwicklung durch Lernen durch Beobachten beschleunigt werden. Dennoch gibt es kaum Arbeiten, die intrinsische Motivation mit Lernen von interagierenden Lehrern verbinden. Die vorliegende Arbeit entwickelt ein neues Lernschema, das diese Verbindung schafft. Der in den vorgeschlagenen Lernmethoden genutzte Explorationsmechanismus beruht auf Goal Babbling, einer zielgerichteten Methode zum Lernen inverser Modelle, die online-fähig ist, kein Vorwissen benötigt und Lernen während der Ausführung von Bewegungen ermöglicht. Das Online-Lernen mehrerer Lösungen inverser Modelle redundanter Roboter mit Goal Babbling wurde bisher nicht erforscht. In dieser Arbeit wird dazu ein inkrementell lernendes, assoziatives neuronales Netz entwickelt und eine Methode konzipiert, die es stabilisiert. Das Netz ermöglicht deren gleichzeitige Exploration und Konsolidierung. Die vorgeschlagenen Verfahren werden für das Greifen nach Objekten demonstriert. Ihre Effizienz, Stabilität und Anwendbarkeit werden simulativ verglichen und mit einem Roboter mit sieben Gelenken demonstriert

    Sensorimotor Representation Learning for an “Active Self” in Robots: A Model Survey

    Get PDF
    Safe human-robot interactions require robots to be able to learn how to behave appropriately in spaces populated by people and thus to cope with the challenges posed by our dynamic and unstructured environment, rather than being provided a rigid set of rules for operations. In humans, these capabilities are thought to be related to our ability to perceive our body in space, sensing the location of our limbs during movement, being aware of other objects and agents, and controlling our body parts to interact with them intentionally. Toward the next generation of robots with bio-inspired capacities, in this paper, we first review the developmental processes of underlying mechanisms of these abilities: The sensory representations of body schema, peripersonal space, and the active self in humans. Second, we provide a survey of robotics models of these sensory representations and robotics models of the self; and we compare these models with the human counterparts. Finally, we analyze what is missing from these robotics models and propose a theoretical computational framework, which aims to allow the emergence of the sense of self in artificial agents by developing sensory representations through self-exploration.Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Projekt DEALPeer Reviewe

    Final report key contents: main results accomplished by the EU-Funded project IM-CLeVeR - Intrinsically Motivated Cumulative Learning Versatile Robots

    Get PDF
    This document has the goal of presenting the main scientific and technological achievements of the project IM-CLeVeR. The document is organised as follows: 1. Project executive summary: a brief overview of the project vision, objectives and keywords. 2. Beneficiaries of the project and contacts: list of Teams (partners) of the project, Team Leaders and contacts. 3. Project context and objectives: the vision of the project and its overall objectives 4. Overview of work performed and main results achieved: a one page overview of the main results of the project 5. Overview of main results per partner: a bullet-point list of main results per partners 6. Main achievements in detail, per partner: a throughout explanation of the main results per partner (but including collaboration work), with also reference to the main publications supporting them

    DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics

    Full text link
    Robots are still limited to controlled conditions, that the robot designer knows with enough details to endow the robot with the appropriate models or behaviors. Learning algorithms add some flexibility with the ability to discover the appropriate behavior given either some demonstrations or a reward to guide its exploration with a reinforcement learning algorithm. Reinforcement learning algorithms rely on the definition of state and action spaces that define reachable behaviors. Their adaptation capability critically depends on the representations of these spaces: small and discrete spaces result in fast learning while large and continuous spaces are challenging and either require a long training period or prevent the robot from converging to an appropriate behavior. Beside the operational cycle of policy execution and the learning cycle, which works at a slower time scale to acquire new policies, we introduce the redescription cycle, a third cycle working at an even slower time scale to generate or adapt the required representations to the robot, its environment and the task. We introduce the challenges raised by this cycle and we present DREAM (Deferred Restructuring of Experience in Autonomous Machines), a developmental cognitive architecture to bootstrap this redescription process stage by stage, build new state representations with appropriate motivations, and transfer the acquired knowledge across domains or tasks or even across robots. We describe results obtained so far with this approach and end up with a discussion of the questions it raises in Neuroscience

    Affordances in Psychology, Neuroscience, and Robotics: A Survey

    Get PDF
    The concept of affordances appeared in psychology during the late 60s as an alternative perspective on the visual perception of the environment. It was revolutionary in the intuition that the way living beings perceive the world is deeply influenced by the actions they are able to perform. Then, across the last 40 years, it has influenced many applied fields, e.g., design, human-computer interaction, computer vision, and robotics. In this paper, we offer a multidisciplinary perspective on the notion of affordances. We first discuss the main definitions and formalizations of the affordance theory, then we report the most significant evidence in psychology and neuroscience that support it, and finally we review the most relevant applications of this concept in robotics

    A Curious Robot Learner for Interactive Goal-Babbling (Strategically Choosing What, How, When and from Whom to Learn)

    Get PDF
    Les dé s pour voir des robots opérant dans l environnement de tous les jours des humains et sur unelongue durée soulignent l importance de leur adaptation aux changements qui peuvent être imprévisiblesau moment de leur construction. Ils doivent être capable de savoir quelles parties échantillonner, et quelstypes de compétences il a intérêt à acquérir. Une manière de collecter des données est de décider par soi-même où explorer. Une autre manière est de se référer à un mentor. Nous appelons ces deux manièresde collecter des données des modes d échantillonnage. Le premier mode d échantillonnage correspondà des algorithmes développés dans la littérature pour automatiquement pousser l agent vers des partiesintéressantes de l environnement ou vers des types de compétences utiles. De tels algorithmes sont appelésdes algorithmes de curiosité arti cielle ou motivation intrinsèque. Le deuxième mode correspond au guidagesocial ou l imitation, où un partenaire humain indique où explorer et où ne pas explorer.Nous avons construit une architecture algorithmique intrinsèquement motivée pour apprendre commentproduire par ses actions des e ets et conséquences variées. Il apprend de manière active et en ligne encollectant des données qu il choisit en utilisant plusieurs modes d échantillonnage. Au niveau du metaapprentissage, il apprend de manière active quelle stratégie d échantillonnage est plus e cace pour améliorersa compétence et généraliser à partir de son expérience à un grand éventail d e ets. Par apprentissage parinteraction, il acquiert de multiples compétences de manière structurée, en découvrant par lui-même lesséquences développementale.The challenges posed by robots operating in human environments on a daily basis and in the long-termpoint out the importance of adaptivity to changes which can be unforeseen at design time. The robot mustlearn continuously in an open-ended, non-stationary and high dimensional space. It must be able to knowwhich parts to sample and what kind of skills are interesting to learn. One way is to decide what to exploreby oneself. Another way is to refer to a mentor. We name these two ways of collecting data sampling modes.The rst sampling mode correspond to algorithms developed in the literature in order to autonomously drivethe robot in interesting parts of the environment or useful kinds of skills. Such algorithms are called arti cialcuriosity or intrinsic motivation algorithms. The second sampling mode correspond to social guidance orimitation where the teacher indicates where to explore as well as where not to explore. Starting fromthe study of the relationships between these two concurrent methods, we ended up building an algorithmicarchitecture with a hierarchical learning structure, called Socially Guided Intrinsic Motivation (SGIM).We have built an intrinsically motivated active learner which learns how its actions can produce variedconsequences or outcomes. It actively learns online by sampling data which it chooses by using severalsampling modes. On the meta-level, it actively learns which data collection strategy is most e cient forimproving its competence and generalising from its experience to a wide variety of outcomes. The interactivelearner thus learns multiple tasks in a structured manner, discovering by itself developmental sequences.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF
    corecore