97 research outputs found

    Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods

    Get PDF
    Das Problem des Reglerentwurfs für dynamische Systeme wurde von Ingenieuren in den letzten Jahrtausenden untersucht. Seit diesen Tagen ist suboptimales Verhalten ein unvermeidlicher Nebeneffekt der manuellen Einstellung von Reglerparametern. Heutzutage steht man in industriellen Anwendungen datengestriebenen Methoden, die das automatische Lernen von Reglerparametern ermöglichen, nach wie vor skeptisch gegenüber. Im Bereich der Robotik gewinnt das maschinelle Lernen (ML) immer mehr an Einfluss und ermöglicht einen erhöhten Grad der Autonomie und Anpassungsfähigkeit, z.B. indem es dabei unterstützt, den Prozess der Reglereinstellung zu automatisieren. Datenintensive Methoden, wie z.B. Methoden des Reinforcement Learning, erfordern jedoch eine große Anzahl experimenteller Versuche, was in der Robotik nicht möglich ist, da die Hardware sich abnutzt und kaputt gehen kann. Das wirft folgende Frage auf: Kann die manuelle Reglereinstellung in der Robotik durch den Einsatz dateneffizienter Techniken des maschinellen Lernens ersetzt werden? In dieser Arbeit gehen wir die obige Frage an, indem wir den Einsatz von Bayes’scher Optimierung (BO), ein dateneffizientes ML-Framework, als Ersatz für manuelles Einstellen unter Beibehaltung einer geringen Anzahl von experimentellen Versuchen untersuchen. Der Fokus dieser Arbeit liegt auf Robotersystemen. Dabei präsentieren wir Demonstrationen mit realen Robotern, sowie fundierte theoretische Ergebnisse zur Steigerung der Dateneffizienz. Im Einzelnen stellen wir vier Hauptbeiträge vor. Zunächst betrachten wir die Verwendung von BO als Ersatz für das manuelle Einstellen auf einer Roboterplattform. Zu diesem Zweck parametrisieren wir die Einstellgewichtungen eines linear-quadratischen Reglers (LQR) und lernen diese Parameter mit einem informationseffizienten BO-Algorithmus. Dieser Algorithmus nutzt Gauß-Prozesse (GPs), um das unbekannte Zielfunktion zu modellieren. Das GP-Modell wird vom BO-Algorithmus genutzt, um Reglerparameter vorzuschlagen von denen erwartet wird, dass sie die Informationen über die optimalen Parameter erhöhen, gemessen als eine Zunahme der Entropie. Das resultierende Framework zur automatischen LQR-Einstellung wird auf zwei Roboterplattformen demonstriert: Ein Robterarm, der einen umgekehrten Stab ausbalanciert und ein humanoider Roboter, der Kniebeugen ausführt. In beiden Fällen wird ein vorhandener Regler in einer handvoll Experimenten automatisch verbessert, ohne dass ein Mensch eingreifen muss. vii BO kompensiert Datenknappheit durch den GP, ein probabilistisches Modell, das a priori Annahmen über das unbekannte Zielfunktion enthält. Normalerweise haben falsche oder uninformierte Annahmen negative Folgen, wie z.B. eine höhere Anzahl von Roboterexperimenten, ein schlechteres Reglerverhalten oder eine verringerte Dateneffizienz. Die hier vorgestellten Beiträge Zwei bis Vier beschäftigen sich mit diesem Problem. Der zweite Beitrag schlägt vor, den Robotersimulator als zusätzliche Informationsquelle für die automatische Reglereinstellung in die Lernschleife miteinzubeziehen. Während reale Roboterexperimente im Allgemeinen hohe Kosten mit sich bringen, sind Simulationen günstiger (sie können z.B. schneller berechnet werden). Da der Simulator aber ein unvollkommenes Modell des Roboters ist, sind seine Informationen einseitig verfälscht und können negative Auswirkungen auf das Lernverhalten haben. Um dieses Problem anzugehen, schlagen wir “sim-vs-real” vor, einen auf grundlegenden Prinzipien beruhenden BO-Algorithmus, der Daten aus Simulationen und Experimenten nutzt. Der Algorithmus wägt dabei die günstigen, aber ungenauen Informationen des Simulators gegen die teuren und exakten physikalischen Experimente in einer kostengünstigen Weise ab. Der daraus resultierende Algorithmus wird an einem inversen Pendels auf einem Wagen demonstriert, bei dem sich Simulationen und reale Experimente abwechseln, wodurch viele reale Experimente eingespart werden. Der dritte Beitrag untersucht, wie die Aussagekraft der probabilistischen Annahmen des vorliegenden Regelungsproblem adäquat behandelt werden kann. Zu diesem Zweck wird die mathematische Struktur des LQR-Reglers genutzt und durch die Kernel-Funktion in den GP eingebaut. Insbesondere schlagen wir zwei verschiedene “LQR-Kernel”-Entwürfe vor, die die Flexibilität des Bayes’schen, nichtparametrischen Lernens beibehalten. Simulierte Ergebnisse deuten darauf hin, dass die LQR-Kernel bessere Ergebnisse erzielen als uninformierte Kernel, wenn sie zum Lernen von Reglern mit BO verwendet werden. Der vierte Beitrag schließlich befasst sich speziell mit dem Problem, wie ein Versagen des Reglers behandelt werden soll. Fehlschläge von Reglern sind beim Lernen aus Daten typischerweise unvermeidbar, insbesondere wenn nichtkonservative Lösungen erwartet werden. Obwohl ein Versagen des Reglers im Allgemeinen problematisch ist (z.B. muss der Roboter mit einem Not-Aus angehalten werden), ist es gleichzeitig eine reichhaltige Informationsquelle darüber, was vermieden werden sollte. Wir schlagen “failures-aware excursion search” vor, einen neuen Algorithmus für Bayes’sche Optimierung mit unbekannten Beschränkungen, bei dem die Anzahl an Fehlern begrenzt ist. Unsere Ergebnisse in numerischen Vergleichsstudien deuten darauf hin, dass, verglichen mit dem aktuellen Stand der Technik, durch das Zulassen einer begrenzten Anzahl von Fehlschlägen bessere Optima aufgedeckt werden. Der erste Beitrag dieser Dissertation ist unter den ersten die BO an realen Robotern anwenden. Diese Arbeit diente dazu, mehrere Probleme zu identifizieren, wie zum Beispiel den Bedarf nach einer höheren Dateneffizienz, was mehrere neue Forschungsrichtungen aufzeigte, die wir durch verschiedene methodische Beiträge addressiert haben. Zusammengefasst haben wir “sim-vs-real”, einen neuen BOAlgorithmus der den Simulator as zusätzliche Informationsquelle miteinbezieht, einen “LQR-Kernel”-Entwurf, der schneller lernt als Standardkernel und “failures-aware excursion search”, einen neuen BO-Algorithmus für beschränkte Black-Box-Optimierungsprobleme, bei denen die Anzahl der Fehler begrenzt ist, vorgeschlagen.In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Eberhard Karls Universität Tübingen’s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.The problem of designing controllers to regulate dynamical systems has been studied by engineers during the past millennia. Ever since, suboptimal performance lingers in many closed loops as an unavoidable side effect of manually tuning the parameters of the controllers. Nowadays, industrial settings remain skeptic about data-driven methods that allow one to automatically learn controller parameters. In the context of robotics, machine learning (ML) keeps growing its influence on increasing autonomy and adaptability, for example to aid automating controller tuning. However, data-hungry ML methods, such as standard reinforcement learning, require a large number of experimental samples, prohibitive in robotics, as hardware can deteriorate and break. This brings about the following question: Can manual controller tuning, in robotics, be automated by using data-efficient machine learning techniques? In this thesis, we tackle the question above by exploring Bayesian optimization (BO), a data-efficient ML framework, to buffer the human effort and side effects of manual controller tuning, while retaining a low number of experimental samples. We focus this work in the context of robotic systems, providing thorough theoretical results that aim to increase data-efficiency, as well as demonstrations in real robots. Specifically, we present four main contributions. We first consider using BO to replace manual tuning in robotic platforms. To this end, we parametrize the design weights of a linear quadratic regulator (LQR) and learn its parameters using an information-efficient BO algorithm. Such algorithm uses Gaussian processes (GPs) to model the unknown performance objective. The GP model is used by BO to suggest controller parameters that are expected to increment the information about the optimal parameters, measured as a gain in entropy. The resulting “automatic LQR tuning” framework is demonstrated on two robotic platforms: A robot arm balancing an inverted pole and a humanoid robot performing a squatting task. In both cases, an existing controller is automatically improved in a handful of experiments without human intervention. BO compensates for data scarcity by means of the GP, which is a probabilistic model that encodes prior assumptions about the unknown performance objective. Usually, incorrect or non-informed assumptions have negative consequences, such as higher number of robot experiments, poor tuning performance or reduced sample-efficiency. The second to fourth contributions presented herein attempt to alleviate this issue. The second contribution proposes to include the robot simulator into the learning loop as an additional information source for automatic controller tuning. While doing a real robot experiment generally entails high associated costs (e.g., require preparation and take time), simulations are cheaper to obtain (e.g., they can be computed faster). However, because the simulator is an imperfect model of the robot, its information is biased and could have negative repercussions in the learning performance. To address this problem, we propose “simu-vs-real”, a principled multi-fidelity BO algorithm that trades off cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. The resulting algorithm is demonstrated on a cart-pole system, where simulations and real experiments are alternated, thus sparing many real evaluations. The third contribution explores how to adequate the expressiveness of the probabilistic prior to the control problem at hand. To this end, the mathematical structure of LQR controllers is leveraged and embedded into the GP, by means of the kernel function. Specifically, we propose two different “LQR kernel” designs that retain the flexibility of Bayesian nonparametric learning. Simulated results indicate that the LQR kernel yields superior performance than non-informed kernel choices when used for controller learning with BO. Finally, the fourth contribution specifically addresses the problem of handling controller failures, which are typically unavoidable in practice while learning from data, specially if non-conservative solutions are expected. Although controller failures are generally problematic (e.g., the robot has to be emergency-stopped), they are also a rich information source about what should be avoided. We propose “failures-aware excursion search”, a novel algorithm for Bayesian optimization under black-box constraints, where failures are limited in number. Our results in numerical benchmarks indicate that by allowing a confined number of failures, better optima are revealed as compared with state-of-the-art methods. The first contribution of this thesis, “automatic LQR tuning”, lies among the first on applying BO to real robots. While it demonstrated automatic controller learning from few experimental samples, it also revealed several important challenges, such as the need of higher sample-efficiency, which opened relevant research directions that we addressed through several methodological contributions. Summarizing, we proposed “simu-vs-real”, a novel BO algorithm that includes the simulator as an additional information source, an “LQR kernel” design that learns faster than standard choices and “failures-aware excursion search”, a new BO algorithm for constrained black-box optimization problems, where the number of failures is limited

    Bio-Inspired Robotics

    Get PDF
    Modern robotic technologies have enabled robots to operate in a variety of unstructured and dynamically-changing environments, in addition to traditional structured environments. Robots have, thus, become an important element in our everyday lives. One key approach to develop such intelligent and autonomous robots is to draw inspiration from biological systems. Biological structure, mechanisms, and underlying principles have the potential to provide new ideas to support the improvement of conventional robotic designs and control. Such biological principles usually originate from animal or even plant models, for robots, which can sense, think, walk, swim, crawl, jump or even fly. Thus, it is believed that these bio-inspired methods are becoming increasingly important in the face of complex applications. Bio-inspired robotics is leading to the study of innovative structures and computing with sensory–motor coordination and learning to achieve intelligence, flexibility, stability, and adaptation for emergent robotic applications, such as manipulation, learning, and control. This Special Issue invites original papers of innovative ideas and concepts, new discoveries and improvements, and novel applications and business models relevant to the selected topics of ``Bio-Inspired Robotics''. Bio-Inspired Robotics is a broad topic and an ongoing expanding field. This Special Issue collates 30 papers that address some of the important challenges and opportunities in this broad and expanding field

    Locomoção bípede adaptativa a partir de uma única demonstração usando primitivas de movimento

    Get PDF
    Doutoramento em Engenharia EletrotécnicaEste trabalho aborda o problema de capacidade de imitação da locomoção humana através da utilização de trajetórias de baixo nível codificadas com primitivas de movimento e utilizá-las para depois generalizar para novas situações, partindo apenas de uma demonstração única. Assim, nesta linha de pensamento, os principais objetivos deste trabalho são dois: o primeiro é analisar, extrair e codificar demonstrações efetuadas por um humano, obtidas por um sistema de captura de movimento de forma a modelar tarefas de locomoção bípede. Contudo, esta transferência não está limitada à simples reprodução desses movimentos, requerendo uma evolução das capacidades para adaptação a novas situações, assim como lidar com perturbações inesperadas. Assim, o segundo objetivo é o desenvolvimento e avaliação de uma estrutura de controlo com capacidade de modelação das ações, de tal forma que a demonstração única apreendida possa ser modificada para o robô se adaptar a diversas situações, tendo em conta a sua dinâmica e o ambiente onde está inserido. A ideia por detrás desta abordagem é resolver o problema da generalização a partir de uma demonstração única, combinando para isso duas estruturas básicas. A primeira consiste num sistema gerador de padrões baseado em primitivas de movimento utilizando sistemas dinâmicos (DS). Esta abordagem de codificação de movimentos possui propriedades desejáveis que a torna ideal para geração de trajetórias, tais como a possibilidade de modificar determinados parâmetros em tempo real, tais como a amplitude ou a frequência do ciclo do movimento e robustez a pequenas perturbações. A segunda estrutura, que está embebida na anterior, é composta por um conjunto de osciladores acoplados em fase que organizam as ações de unidades funcionais de forma coordenada. Mudanças em determinadas condições, como o instante de contacto ou impactos com o solo, levam a modelos com múltiplas fases. Assim, em vez de forçar o movimento do robô a situações pré-determinadas de forma temporal, o gerador de padrões de movimento proposto explora a transição entre diferentes fases que surgem da interação do robô com o ambiente, despoletadas por eventos sensoriais. A abordagem proposta é testada numa estrutura de simulação dinâmica, sendo que várias experiências são efetuadas para avaliar os métodos e o desempenho dos mesmos.This work addresses the problem of learning to imitate human locomotion actions through low-level trajectories encoded with motion primitives and generalizing them to new situations from a single demonstration. In this line of thought, the main objectives of this work are twofold: The first is to analyze, extract and encode human demonstrations taken from motion capture data in order to model biped locomotion tasks. However, transferring motion skills from humans to robots is not limited to the simple reproduction, but requires the evaluation of their ability to adapt to new situations, as well as to deal with unexpected disturbances. Therefore, the second objective is to develop and evaluate a control framework for action shaping such that the single-demonstration can be modulated to varying situations, taking into account the dynamics of the robot and its environment. The idea behind the approach is to address the problem of generalization from a single-demonstration by combining two basic structures. The first structure is a pattern generator system consisting of movement primitives learned and modelled by dynamical systems (DS). This encoding approach possesses desirable properties that make them well-suited for trajectory generation, namely the possibility to change parameters online such as the amplitude and the frequency of the limit cycle and the intrinsic robustness against small perturbations. The second structure, which is embedded in the previous one, consists of coupled phase oscillators that organize actions into functional coordinated units. The changing contact conditions plus the associated impacts with the ground lead to models with multiple phases. Instead of forcing the robot’s motion into a predefined fixed timing, the proposed pattern generator explores transition between phases that emerge from the interaction of the robot system with the environment, triggered by sensor-driven events. The proposed approach is tested in a dynamics simulation framework and several experiments are conducted to validate the methods and to assess the performance of a humanoid robot

    Metastable legged-robot locomotion

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 195-215).A variety of impressive approaches to legged locomotion exist; however, the science of legged robotics is still far from demonstrating a solution which performs with a level of flexibility, reliability and careful foot placement that would enable practical locomotion on the variety of rough and intermittent terrain humans negotiate with ease on a regular basis. In this thesis, we strive toward this particular goal by developing a methodology for designing control algorithms for moving a legged robot across such terrain in a qualitatively satisfying manner, without falling down very often. We feel the definition of a meaningful metric for legged locomotion is a useful goal in and of itself. Specifically, the mean first-passage time (MFPT), also called the mean time to failure (MTTF), is an intuitively practical cost function to optimize for a legged robot, and we present the reader with a systematic, mathematical process for obtaining estimates of this MFPT metric. Of particular significance, our models of walking on stochastically rough terrain generally result in dynamics with a fast mixing time, where initial conditions are largely "forgotten" within 1 to 3 steps. Additionally, we can often find a near-optimal solution for motion planning using only a short time-horizon look-ahead. Although we openly recognize that there are important classes of optimization problems for which long-term planning is required to avoid "running into a dead end" (or off of a cliff!), we demonstrate that many classes of rough terrain can in fact be successfully negotiated with a surprisingly high level of long-term reliability by selecting the short-sighted motion with the greatest probability of success. The methods used throughout have direct relevance to machine learning, providing a physics-based approach to reduce state space dimensionality and mathematical tools to obtain a scalar metric quantifying performance of the resulting reduced-order system.by Katie Byl.Ph.D
    corecore