7 research outputs found
Recommended from our members
Reinforcement Learning for Generative Art
Reinforcement learning (RL) is an efficient class of sequential decision-making algorithms that have achieved remarkable success in a broad range of applications, such as robotic manipulations, strategic games, or autonomous driving. The most well-known example of reinforcement learning is AlphaGo, a computer program that plays the board game Go and outperforms top human Go players. Unlike other two major machine learning categories, supervised learning and unsupervised learning, in which media artists are actively engaged, reinforcement learning has yet to result in many creative applications. Generative art is usually driven, in whole or in part, by autonomous systems that are derived from a set of rules. Interestingly, an RL policy can be seen as an autonomous system where the rules are learned by interacting with its environment. Regardless of its initial purpose, reinforcement learning has the potential to expand the boundary of generative art. However, a formal process of applying reinforcement learning to generative art does not yet exist and the current RL tools require an in-depth understanding of RL concepts. To bridge the gap, the first part of the dissertation introduces a conceptual framework to adapt reinforcement learning for generative art. The framework proposes a term RL-based generative art to denote a novel form of generative art of which the use of RL agents is the key element. The creative process of RL-based generative art and possible emergent behaviors are discussed in the framework. This leads to a discussion of several author's related practices on generative art, deep-learning art, and reinforcement learning. Those practices are critical for understanding the conceptual and technical details of each component in order to construct the framework. The second part introduces RL5, a JavaScript library for rapidly prototyping RL environments and training RL policies in web browsers. The library combines RL algorithms and RL environments into one framework and is fully compatible with p5.js. RL5 is developed with a particular focus on simplicity to favor (re)usability of RL algorithms and development of RL environments. Specifically, the library implemented three RL algorithms, Tabular Q-learning, REINFORCE, and DDPG, to cover all the three families of model-free RL, and nine RL environments that six of them address autonomous agents in steering behaviors, which can be used as building blocks for complex systems. Finally, the author demonstrates four different use cases of how to apply RL5 for pedagogical and creative applications
Minimizing Human Assistance: Augmenting a Single Demonstration for Deep Reinforcement Learning
The use of human demonstrations in reinforcement learning has proven to
significantly improve agent performance. However, any requirement for a human
to manually 'teach' the model is somewhat antithetical to the goals of
reinforcement learning. This paper attempts to minimize human involvement in
the learning process while still retaining the performance advantages by using
a single human example collected through a simple-to-use virtual reality
simulation to assist with RL training. Our method augments a single
demonstration to generate numerous human-like demonstrations that, when
combined with Deep Deterministic Policy Gradients and Hindsight Experience
Replay (DDPG + HER), significantly improve training time on simple tasks and
allows the agent to solve a complex task (block stacking) that DDPG + HER alone
cannot solve. The model achieves this significant training advantage using a
single human example, requiring less than a minute of human input.Comment: 7 pages, 11 figure
Learning and generalising object extraction skill for contact-rich disassembly tasks: an introductory study
Remanufacturing automation must be designed to be flexible and robust enough to overcome the uncertainties, conditions of the products, and complexities in the planning and operation of the processes. Machine learning methods, in particular reinforcement learning, are presented as techniques to learn, improve, and generalise the automation of many robotic manipulation tasks (most of them related to grasping, picking, or assembly). However, not much has been exploited in remanufacturing, in particular in disassembly tasks. This work presents the state of the art of contact-rich disassembly using reinforcement learning algorithms and a study about the generalisation of object extraction skills when applied to contact-rich disassembly tasks. The generalisation capabilities of two state-of-the-art reinforcement learning agents (trained in simulation) are tested and evaluated in simulation, and real world while perform a disassembly task. Results show that at least one of the agents can generalise the contact-rich extraction skill. Besides, this work identifies key concepts and gaps for the reinforcement learning algorithms’ research and application on disassembly tasks
Bridging the Sim-to-Real Gap with Dynamic Compliance Tuning for Industrial Insertion
Contact-rich manipulation tasks often exhibit a large sim-to-real gap. For
instance, industrial assembly tasks frequently involve tight insertions where
the clearance is less than mm and can even be negative when dealing
with a deformable receptacle. This narrow clearance leads to complex contact
dynamics that are difficult to model accurately in simulation, making it
challenging to transfer simulation-learned policies to real-world robots. In
this paper, we propose a novel framework for robustly learning manipulation
skills for real-world tasks using only the simulated data. Our framework
consists of two main components: the ``Force Planner'' and the ``Gain Tuner''.
The Force Planner is responsible for planning both the robot motion and desired
contact forces, while the Gain Tuner dynamically adjusts the compliance control
gains to accurately track the desired contact forces during task execution. The
key insight of this work is that by adaptively adjusting the robot's compliance
control gains during task execution, we can modulate contact forces in the new
environment, thereby generating trajectories similar to those trained in
simulation and narrows the sim-to-real gap. Experimental results show that our
method, trained in simulation on a generic square peg-and-hole task, can
generalize to a variety of real-world insertion tasks involving narrow or even
negative clearances, all without requiring any fine-tuning
Effizientes und stabiles online Lernen fĂĽr "Developmental Robots"
Recent progress in robotics and cognitive science has inspired a new generation of more versatile robots, so-called developmental robots. Many learning approaches for these robots are inspired by developmental processes and learning mechanisms observed in children. It is widely accepted that developmental robots must autonomously develop, acquire their skills, and cope with unforeseen challenges in unbounded environments through lifelong learning. Continuous online adaptation and intrinsically motivated learning are thus essential capabilities for these robots. However, the high sample-complexity of online learning and intrinsic motivation methods impedes the efficiency and practical feasibility of these methods for lifelong learning. Consequently, the majority of previous work has been demonstrated only in simulation. This thesis devises new methods and learning schemes to mitigate this problem and to permit direct online training on physical robots. A novel intrinsic motivation method is developed to drive the robot’s exploration to efficiently select what to learn. This method combines new knowledge-based and competence-based signals to increase sample-efficiency and to enable lifelong learning. While developmental robots typically acquire their skills through self-exploration, their autonomous development could be accelerated by additionally learning from humans. Yet there is hardly any research to integrate intrinsic motivation with learning from a teacher. The thesis therefore establishes a new learning scheme to integrate intrinsic motivation with learning from observation. The underlying exploration mechanism in the proposed learning schemes relies on Goal Babbling as a goal-directed method for learning direct inverse robot models online, from scratch, and in a learning while behaving fashion. Online learning of multiple solutions for redundant robots with this framework was missing. This thesis devises an incremental online associative network to enable simultaneous exploration and solution consolidation and establishes a new technique to stabilize the learning system. The proposed methods and learning schemes are demonstrated for acquiring reaching skills. Their efficiency, stability, and applicability are benchmarked in simulation and demonstrated on a physical 7-DoF Baxter robot arm.Jüngste Entwicklungen in der Robotik und den Kognitionswissenschaften haben zu einer Generation von vielseitigen Robotern geführt, die als ”Developmental Robots” bezeichnet werden. Lernverfahren für diese Roboter sind inspiriert von Lernmechanismen, die bei Kindern beobachtet wurden. ”Developmental Robots” müssen autonom Fertigkeiten erwerben und unvorhergesehene Herausforderungen in uneingeschränkten Umgebungen durch lebenslanges Lernen meistern. Kontinuierliches Anpassen und Lernen durch intrinsische Motivation sind daher wichtige Eigenschaften. Allerdings schränkt der hohe Aufwand beim Generieren von Datenpunkten die praktische Nutzbarkeit solcher Verfahren ein. Daher wurde ein Großteil nur in Simulationen demonstriert. In dieser Arbeit werden daher neue Methoden konzipiert, um dieses Problem zu meistern und ein direktes Online-Training auf realen Robotern zu ermöglichen. Dazu wird eine neue intrinsisch motivierte Methode entwickelt, die während der Umgebungsexploration effizient auswählt, was gelernt wird. Sie kombiniert neue wissens- und kompetenzbasierte Signale, um die Sampling-Effizienz zu steigern und lebenslanges Lernen zu ermöglichen. Während ”Developmental Robots” Fertigkeiten durch Selbstexploration erwerben, kann ihre Entwicklung durch Lernen durch Beobachten beschleunigt werden. Dennoch gibt es kaum Arbeiten, die intrinsische Motivation mit Lernen von interagierenden Lehrern verbinden. Die vorliegende Arbeit entwickelt ein neues Lernschema, das diese Verbindung schafft. Der in den vorgeschlagenen Lernmethoden genutzte Explorationsmechanismus beruht auf Goal Babbling, einer zielgerichteten Methode zum Lernen inverser Modelle, die online-fähig ist, kein Vorwissen benötigt und Lernen während der Ausführung von Bewegungen ermöglicht. Das Online-Lernen mehrerer Lösungen inverser Modelle redundanter Roboter mit Goal Babbling wurde bisher nicht erforscht. In dieser Arbeit wird dazu ein inkrementell lernendes, assoziatives neuronales Netz entwickelt und eine Methode konzipiert, die es stabilisiert. Das Netz ermöglicht deren gleichzeitige Exploration und Konsolidierung. Die vorgeschlagenen Verfahren werden für das Greifen nach Objekten demonstriert. Ihre Effizienz, Stabilität und Anwendbarkeit werden simulativ verglichen und mit einem Roboter mit sieben Gelenken demonstriert