105 research outputs found

    Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation

    Get PDF
    Queißer J. Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation. Bielefeld: Universität Bielefeld; 2018.Modern robotic applications pose complex requirements with respect to the adaptation of actions regarding the variability in a given task. Reinforcement learning can optimize for changing conditions, but relearning from scratch is hardly feasible due to the high number of required rollouts. This work proposes a parameterized skill that generalizes to new actions for changing task parameters. The actions are encoded by a meta-learner that provides parameters for task-specific dynamic motion primitives. Experimental evaluation shows that the utilization of parameterized skills for initialization of the optimization process leads to a more effective incremental task learning. A proposed hybrid optimization method combines a fast coarse optimization on a manifold of policy parameters with a fine-grained parameter search in the unrestricted space of actions. It is shown that the developed algorithm reduces the number of required rollouts for adaptation to new task conditions. Further, this work presents a transfer learning approach for adaptation of learned skills to new situations. Application in illustrative toy scenarios, for a 10-DOF planar arm, a humanoid robot point reaching task and parameterized drumming on a pneumatic robot validate the approach. But parameterized skills that are applied on complex robotic systems pose further challenges: the dynamics of the robot and the interaction with the environment introduce model inaccuracies. In particular, high-level skill acquisition on highly compliant robotic systems such as pneumatically driven or soft actuators is hardly feasible. Since learning of the complete dynamics model is not feasible due to the high complexity, this thesis examines two alternative approaches: First, an improvement of the low-level control based on an equilibrium model of the robot. Utilization of an equilibrium model reduces the learning complexity and this thesis evaluates its applicability for control of pneumatic and industrial light-weight robots. Second, an extension of parameterized skills to generalize for forward signals of action primitives that result in an enhanced control quality of complex robotic systems. This thesis argues for a shift in the complexity of learning the full dynamics of the robot to a lower dimensional task-related learning problem. Due to the generalization in relation to the task variability, online learning for complex robots as well as complex scenarios becomes feasible. An experimental evaluation investigates the generalization capabilities of the proposed online learning system for robot motion generation. Evaluation is performed through simulation of a compliant 2-DOF arm and scalability to a complex robotic system is demonstrated for a pneumatically driven humanoid robot with 8-DOF

    Humanoid Robots

    Get PDF
    For many years, the human being has been trying, in all ways, to recreate the complex mechanisms that form the human body. Such task is extremely complicated and the results are not totally satisfactory. However, with increasing technological advances based on theoretical and experimental researches, man gets, in a way, to copy or to imitate some systems of the human body. These researches not only intended to create humanoid robots, great part of them constituting autonomous systems, but also, in some way, to offer a higher knowledge of the systems that form the human body, objectifying possible applications in the technology of rehabilitation of human beings, gathering in a whole studies related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, among other areas. This book presents a series of researches inspired by this ideal, carried through by various researchers worldwide, looking for to analyze and to discuss diverse subjects related to humanoid robots. The presented contributions explore aspects about robotic hands, learning, language, vision and locomotion

    Adaptive control of compliant robots with Reservoir Computing

    Get PDF
    In modern society, robots are increasingly used to handle dangerous, repetitive and/or heavy tasks with high precision. Because of the nature of the tasks, either being dangerous, high precision or simply repetitive, robots are usually constructed with high torque motors and sturdy materials, that makes them dangerous for humans to handle. In a car-manufacturing company, for example, a large cage is placed around the robot’s workspace that prevents humans from entering its vicinity. In the last few decades, efforts have been made to improve human-robot interaction. Often the movement of robots is characterized as not being smooth and clearly dividable into sub-movements. This makes their movement rather unpredictable for humans. So, there exists an opportunity to improve the motion generation of robots to enhance human-robot interaction. One interesting research direction is that of imitation learning. Here, human motions are recorded and demonstrated to the robot. Although the robot is able to reproduce such movements, it cannot be generalized to other situations. Therefore, a dynamical system approach is proposed where the recorded motions are embedded into the dynamics of the system. Shaping these nonlinear dynamics, according to recorded motions, allows for dynamical system to generalize beyond demonstration. As a result, the robot can generate motions of other situations not included in the recorded human demonstrations. In this dissertation, a Reservoir Computing approach is used to create a dynamical system in which such demonstrations are embedded. Reservoir Computing systems are Recurrent Neural Network-based approaches that are efficiently trained by considering only the training of the readout connections and retaining all other connections of such a network unchanged given their initial randomly chosen values. Although they have been used to embed periodic motions before, they were extended to embed discrete motions, or both. This work describes how such a motion pattern-generating system is built, investigates the nature of the underlying dynamics and evaluates their robustness in the face of perturbations. Additionally, a dynamical system approach to obstacle avoidance is proposed that is based on vector fields in the presence of repellers. This technique can be used to extend the motion abilities of the robot without need for changing the trained Motion Pattern Generator (MPG). Therefore, this approach can be applied in real-time on any system that generates a certain movement trajectory. Assume that the MPG system is implemented on an industrial robotic arm, similar to the ones used in a car factory. Even though the obstacle avoidance strategy presented is able to modify the generated motion of the robot’s gripper in such a way that it avoids obstacles, it does not guarantee that other parts of the robot cannot collide with a human. To prevent this, engineers have started to use advanced control algorithms that measure the amount of torque that is applied on the robot. This allows the robot to be aware of external perturbations. However, it turns out that, even with fast control loops, the adaptation to compensate for a sudden perturbation, is too slow to prevent high interaction forces. To reduce such forces, researchers started to use mechanical elements that are passively compliant (e.g., springs) and light-weight flexible materials to construct robots. Although such compliant robots are much safer and inherently energy efficient to use, their control becomes much harder. Most control approaches use model information about the robot (e.g., weight distribution and shape). However, when constructing a compliant robot it is hard to determine the dynamics of these materials. Therefore, a model-free adaptive control framework is proposed that assumes no prior knowledge about the robot. By interacting with the robot it learns an inverse robot model that is used as controller. The more it interacts, the better the control be- comes. Appropriately, this framework is called Inverse Modeling Adaptive (IMA) control framework. I have evaluated the IMA controller’s tracking ability on sev- eral tasks, investigating its model independence and stability. Furthermore, I have shown its fast learning ability and comparable performance to taskspecific designed controllers. Given both the MPG and IMA controllers, it is possible to improve the inter- actability of a compliant robot in a human-friendly environment. When the robot is to perform human-like motions for a large set of tasks, we need to demonstrate motion examples of all these tasks. However, biological research concerning the motion generation of animals and humans revealed that a limited set of motion patterns, called motion primitives, are modulated and combined to generate advanced motor/motion skills that humans and animals exhibit. Inspired by these interesting findings, I investigate if a single motion primitive indeed can be modulated to achieve a desired motion behavior. By some elementary experiments, where an MPG is controlled by an IMA controller, a proof of concept is presented. Furthermore, a general hierarchy is introduced that describes how a robot can be controlled in a biology-inspired manner. I also investigated how motion primitives can be combined to produce a desired motion. However, I was unable to get more advanced implementations to work. The results of some simple experiments are presented in the appendix. Another approach I investigated assumes that the primitives themselves are undefined. Instead, only a high-level description is given, which describes that every primitive on average should contribute equally, while still allowing for a single primitive to specialize in a part of the motion generation. Without defining the behavior of a primitive, only a set of untrained IMA controllers is used of which each will represent a single primitive. As a result of the high-level heuristic description, the task space is tiled into sub-regions in an unsupervised manner. Resulting in controllers that indeed represent a part of the motion generation. I have applied this Modular Architecture with Control Primitives (MACOP) on an inverse kinematic learning task and investigated the emerged primitives. Thanks to the tiling of the task space, it becomes possible to control redundant systems, because redundant solutions can be spread over several control primitives. Within each sub region of the task space, a specific control primitive is more accurate than in other regions allowing for the task complexity to be distributed over several less complex tasks. Finally, I extend the use of an IMA-controller, which is tracking controller, to the control of under-actuated systems. By using a sample-based planning algorithm it becomes possible to explore the system dynamics in which a path to a desired state can be planned. Afterwards, MACOP is used to incorporate feedback and to learn the necessary control commands corresponding to the planned state space trajectory, even if it contains errors. As a result, the under-actuated control of a cart pole system was achieved. Furthermore, I presented the concept of a simulation based control framework that allows the learning of the system dynamics, planning and feedback control iteratively and simultaneously

    Bio­-inspired approaches to the control and modelling of an anthropomimetic robot

    Get PDF
    Introducing robots into human environments requires them to handle settings designed specifically for human size and morphology, however, large, conventional humanoid robots with stiff, high powered joint actuators pose a significant danger to humans. By contrast, “anthropomimetic” robots mimic both human morphology and internal structure; skeleton, muscles, compliance and high redundancy. Although far safer, their resultant compliant structure presents a formidable challenge to conventional control. Here we review, and seek to address, characteristic control issues of this class of robot, whilst exploiting their biomimetic nature by drawing upon biological motor control research. We derive a novel learning controller for discovering effective reaching actions created through sustained activation of one or more muscle synergies, an approach which draws upon strong, recent evidence from animal and humans studies, but is almost unexplored to date in musculoskeletal robot literature. Since the best synergies for a given robot will be unknown, we derive a deliberately simple reinforcement learning approach intended to allow their emergence, in particular those patterns which aid linearization of control. We also draw upon optimal control theories to encourage the emergence of smoother movement by incorporating signal dependent noise and trial repetition. In addition, we argue the utility of developing a detailed dynamic model of a complete robot and present a stable, physics-­‐‑based model, of the anthropomimetic ECCERobot, running in real time with 55 muscles and 88 degrees of freedom. Using the model, we find that effective reaching actions can be learned which employ only two sequential motor co-­‐‑activation patterns, each controlled by just a single common driving signal. Factor analysis shows the emergent muscle co-­‐‑activations can be reconstructed to significant accuracy using weighted combinations of only 13 common fragments, labelled “candidate synergies”. Using these synergies as drivable units the same controller learns the same task both faster and better, however, other reaching tasks perform less well, proportional to dissimilarity; we therefore propose that modifications enabling emergence of a more generic set of synergies are required. Finally, we propose a continuous controller for the robot, based on model predictive control, incorporating our model as a predictive component for state estimation, delay-­‐‑ compensation and planning, including merging of the robot and sensed environment into a single model. We test the delay compensation mechanism by controlling a second copy of the model acting as a proxy for the real robot, finding that performance is significantly improved if a precise degree of compensation is applied and show how rapidly an un-­‐‑compensated controller fails as the model accuracy degrades

    Models for reinforcement learning and design of a soft robot inspired by Drosophila larvae

    Get PDF
    Designs for robots are often inspired by animals, as they are designed mimicking animals’ mechanics, motions, behaviours and learning. The Drosophila, known as the fruit fly, is a well-studied model animal. In this thesis, the Drosophila larva is studied and the results are applied to robots. More specifically: a part of the Drosophila larva’s neural circuit for operant learning is modelled, based on which a synaptic plasticity model and a neural circuit model for operant learning, as well as a dynamic neural network for robot reinforcement learning, are developed; then Drosophila larva’s motor system for locomotion is studied, and based on it a soft robot system is designed. Operant learning is a concept similar to reinforcement learning in computer science, i.e. learning by reward or punishment for behaviour. Experiments have shown that a wide range of animals is capable of operant learning, including animal with only a few neurons, such as Drosophila. The fact implies that operant learning can establish without a large number of neurons. With it as an assumption, the structure and dynamics of synapses are investigated, and a synaptic plasticity model is proposed. The model includes nonlinear dynamics of synapses, especially receptor trafficking which affects synaptic strength. Tests of this model show it can enable operant learning at the neuron level and apply to a broad range of NNs, including feedforward, recurrent and spiking NNs. The mushroom body is a learning centre of the insect brain known and modelled for associative learning, but not yet for operant learning. To investigate whether it participates in operant learning, Drosophila larvae are studied with a transgenic tool by my collaborators. Based on the experiment and the results, a mushroom body model capable of operant learning is modelled. The proposed neural circuit model can reproduce the operant learning of the turning behaviour of Drosophila larvae. Then the synaptic plasticity model is simplified for robot learning. With the simplified model, a recurrent neural network with internal neural dynamics can learn to control a planar bipedal robot in a benchmark reinforcement learning task which is called bipedal walker by OpenAI. Benefiting efficiency in parameter space exploration instead of action space exploration, it is the first known solution to the task with reinforcement learning approaches. Although existing pneumatic soft robots can have multiple muscles embedded in a component, it is far less than the muscles in the Drosophila larva, which are well-organised in a tiny space. A soft robot system is developed based on the muscle pattern of the Drosophila larva, to explore the possibility to embed a high density of muscles in a limited space. Three versions of the body wall with pneumatic muscles mimicking the muscle pattern are designed. A pneumatic control system and embedded control system are also developed for controlling the robot. With a bioinspired body wall will a large number of muscles, the robot performs lifelike motions in experiments

    Viability in State-Action Space: Connecting Morphology, Control, and Learning

    Get PDF
    Wie können wir Robotern ermöglichen, modellfrei und direkt auf der Hardware zu lernen? Das maschinelle Lernen nimmt als Standardwerkzeug im Arsenal des Robotikers seinen Platz ein. Es gibt jedoch einige offene Fragen, wie man die Kontrolle über physikalische Systeme lernen kann. Diese Arbeit gibt zwei Antworten auf diese motivierende Frage. Das erste ist ein formales Mittel, um die inhärente Robustheit eines gegebenen Systemdesigns zu quantifizieren, bevor der Controller oder das Lernverfahren entworfen wird. Dies unterstreicht die Notwendigkeit, sowohl das Hardals auch das Software-Design eines Roboters zu berücksichtigen, da beide Aspekte in der Systemdynamik untrennbar miteinander verbunden sind. Die zweite ist die Formalisierung einer Sicherheitsmass, die modellfrei erlernt werden kann. Intuitiv zeigt diese Mass an, wie leicht ein Roboter Fehlschläge vermeiden kann. Auf diese Weise können Roboter unbekannte Umgebungen erkunden und gleichzeitig Ausfälle vermeiden. Die wichtigsten Beiträge dieser Dissertation basieren sich auf der Viabilitätstheorie. Viabilität bietet eine alternative Sichtweise auf dynamische Systeme: Anstatt sich auf die Konvergenzeigenschaften eines Systems in Richtung Gleichgewichte zu konzentrieren, wird der Fokus auf Menge von Fehlerzuständen und die Fähigkeit des Systems, diese zu vermeiden, verlagert. Diese Sichtweise eignet sich besonders gut für das Studium der Lernkontrolle an Robotern, da Stabilität im Sinne einer Konvergenz während des Lernprozesses selten gewährleistet werden kann. Der Begriff der Viabilität wird formal auf den Zustand-Aktion-Raum erweitert, mit Viabilitätsmengen von Staat-Aktionspaaren. Eine über diese Mengen definierte Mass ermöglicht eine quantifizierte Bewertung der Robustheit, die für die Familie aller fehlervermeidenden Regler gilt, und ebnet den Weg für ein sicheres, modellfreies Lernen. Die Arbeit beinhaltet auch zwei kleinere Beiträge. Der erste kleine Beitrag ist eine empirische Demonstration der Shaping durch ausschliessliche Modifikation der Systemdynamik. Diese Demonstration verdeutlicht die Bedeutung der Robustheit gegenüber Fehlern für die Lernkontrolle: Ausfälle können nicht nur Schäden verursachen, sondern liefern in der Regel auch keine nützlichen Gradienteninformationen für den Lernprozess. Der zweite kleine Beitrag ist eine Studie über die Wahl der Zustandsinitialisierungen. Entgegen der Intuition und der üblichen Praxis zeigt diese Studie, dass es zuverlässiger sein kann, das System gelegentlich aus einem Zustand zu initialisieren, der bekanntermassen unkontrollierbar ist.How can we enable robots to learn control model-free and directly on hardware? Machine learning is taking its place as a standard tool in the roboticist’s arsenal. However, there are several open questions on how to learn control for physical systems. This thesis provides two answers to this motivating question. The first is a formal means to quantify the inherent robustness of a given system design, prior to designing the controller or learning agent. This emphasizes the need to consider both the hardware and software design of a robot, which are inseparably intertwined in the system dynamics. The second is the formalization of a safety-measure, which can be learned model-free. Intuitively, this measure indicates how easily a robot can avoid failure, and enables robots to explore unknown environments while avoiding failures. The main contributions of this dissertation are based on viability theory. Viability theory provides a slightly unconventional view of dynamical systems: instead of focusing on a system’s convergence properties towards equilibria, the focus is shifted towards sets of failure states and the system’s ability to avoid these sets. This view is particularly well suited to studying learning control in robots, since stability in the sense of convergence can rarely be guaranteed during the learning process. The notion of viability is formally extended to state-action space, with viable sets of state-action pairs. A measure defined over these sets allows a quantified evaluation of robustness valid for the family of all failure-avoiding control policies, and also paves the way for enabling safe model-free learning. The thesis also includes two minor contributions. The first minor contribution is an empirical demonstration of shaping by exclusively modifying the system dynamics. This demonstration highlights the importance of robustness to failures for learning control: not only can failures cause damage, but they typically do not provide useful gradient information for the learning process. The second minor contribution is a study on the choice of state initializations. Counter to intuition and common practice, this study shows it can be more reliable to occasionally initialize the system from a state that is known to be uncontrollable

    Data-Driven Methods Applied to Soft Robot Modeling and Control: A Review

    Get PDF
    Soft robots show compliance and have infinite degrees of freedom. Thanks to these properties, such robots can be leveraged for surgery, rehabilitation, biomimetics, unstructured environment exploring, and industrial grippers. In this case, they attract scholars from a variety of areas. However, nonlinearity and hysteresis effects also bring a burden to robot modeling. Moreover, following their flexibility and adaptation, soft robot control is more challenging than rigid robot control. In order to model and control soft robots, a large number of data-driven methods are utilized in pairs or separately. This review first briefly introduces two foundations for data-driven approaches, which are physical models and the Jacobian matrix, then summarizes three kinds of data-driven approaches, which are statistical method, neural network, and reinforcement learning. This review compares the modeling and controller features, e.g., model dynamics, data requirement, and target task, within and among these categories. Finally, we summarize the features of each method. A discussion about the advantages and limitations of the existing modeling and control approaches is presented, and we forecast the future of data-driven approaches in soft robots. A website (https://sites.google.com/view/23zcb) is built for this review and will be updated frequently. Note to Practitioners —This work is motivated by the need for a review introducing soft robot modeling and control methods in parallel. Modeling and control play significant roles in robot research, and they are challenging especially for soft robots. The nonlinear and complex deformation of such robots necessitates specific modeling and control approaches. We introduce the state-of-the-art data-driven methods and survey three approaches widely utilized. This review also compares the performance of these methods, considering some important features like data amount requirement, control frequency, and target task. The features of each approach are summarized, and we discuss the possible future of this area
    corecore