82 research outputs found
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
We present a tutorial on Bayesian optimization, a method of finding the
maximum of expensive cost functions. Bayesian optimization employs the Bayesian
technique of setting a prior over the objective function and combining it with
evidence to get a posterior function. This permits a utility-based selection of
the next observation to make on the objective function, which must take into
account both exploration (sampling from areas of high uncertainty) and
exploitation (sampling areas likely to offer improvement over the current best
observation). We also present two detailed extensions of Bayesian optimization,
with experiments---active user modelling with preferences, and hierarchical
reinforcement learning---and a discussion of the pros and cons of Bayesian
optimization based on our experiences
Nonstrict hierarchical reinforcement learning for interactive systems and robots
Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users
Reinforcement Learning Approaches in Social Robotics
This article surveys reinforcement learning approaches in social robotics.
Reinforcement learning is a framework for decision-making problems in which an
agent interacts through trial-and-error with its environment to discover an
optimal behavior. Since interaction is a key component in both reinforcement
learning and social robotics, it can be a well-suited approach for real-world
interactions with physically embodied social robots. The scope of the paper is
focused particularly on studies that include social physical robots and
real-world human-robot interactions with users. We present a thorough analysis
of reinforcement learning approaches in social robotics. In addition to a
survey, we categorize existent reinforcement learning approaches based on the
used method and the design of the reward mechanisms. Moreover, since
communication capability is a prominent feature of social robots, we discuss
and group the papers based on the communication medium used for reward
formulation. Considering the importance of designing the reward function, we
also provide a categorization of the papers based on the nature of the reward.
This categorization includes three major themes: interactive reinforcement
learning, intrinsically motivated methods, and task performance-driven methods.
The benefits and challenges of reinforcement learning in social robotics,
evaluation methods of the papers regarding whether or not they use subjective
and algorithmic measures, a discussion in the view of real-world reinforcement
learning challenges and proposed solutions, the points that remain to be
explored, including the approaches that have thus far received less attention
is also given in the paper. Thus, this paper aims to become a starting point
for researchers interested in using and applying reinforcement learning methods
in this particular research field
Reinforcement Learning Approach for Inspect/Correct Tasks
In this research, we focus on the application of reinforcement learning (RL) in automated agent tasks involving considerable target variability (i.e., characterized by stochastic distributions); in particular, learning of inspect/correct tasks. Examples include automated identification & correction of rivet failures in airplane maintenance procedures, and automated cleaning of surgical instruments in a hospital sterilization processing department. The location of defects and the corrective action to be taken for each varies from task episode. What needs to be learned are optimal stochastic strategies rather than optimization of any one single defect type and location. RL has been widely applied in robotics and autonomous agents research, but primarily for problems with relatively low variability compared to the task requirements overall.
We characterize the performance of RL at varying levels of variability in a grid world environment at different task complexity levels, and analyze RL performance problems seen during the experiments. The experiments revealed that the higher variability in the stochastic environments significantly reduces the RL agent\u27s performance due to forgetting (or overwriting) effects as the most recent observation from the stochastic environment unduly influences learned behavior. Furthermore, we characterize the impact of variability on hyperparameter selection.
To help mitigate the impact of variability on RL performance, we developed a chain of -tables approach aimed at reducing the impact of subtask variability on other subtasks within a training episode. The performance of the chain of -tables approach was assessed against the original SARSA RL and the double SARSA approach. In high and very high variability cases, the chain of -tables approach outperforms the others in terms of the efficiency, accumulated reward, number of steps, and computational time.
An adaptive hyperparameter setting method was developed based on a sample variability metric. The approach quickly estimates the environmental variability and automatically sets appropriate hyperparameter values
Lernen komplexer Aufgaben aus Demonstration und eigener Erfahrung
Heutige Industrieproduktionen wären nicht möglich ohne die Erfindung von Robotern, die effizient und präzise sich ständig wiederholende Aufgaben ausführen. Gleichzeitig stellt die industrielle Fertigung das bisher einzige Gebiet dar, in dem Roboter in großem Maßstab eingesetzt werden. Dabei gibt es auch in anderen Bereichen des Alltags Aufgaben, bei denen Roboter Menschen sinnvoll unterstützen können. Für die Entwicklung von Servicerobotern für diese neuen Einsatzgebiete ergeben sich eine Reihe von Herausforderungen. So ist etwa eine Programmierung, die ab Werk alle Ausprägungen der Aufgabe und Rahmenbedingungen berücksichtigt, nicht mehr praktikabel. In diesem Vortrag werden daher Verfahren vorgestellt, mit deren Hilfe Roboter die benötigten Fähigkeiten auf eine intuitive Art und Weise erlernen und sie bei Bedarf an neue Situationen anpassen und ergänzen können. Als Voraussetzung zum Erlernen von Aktionen wird zunächst ein Verfahren zur Segmentierung und Klassifizierung von Bewegungstrajektorien einerseits und zur Erzeugung generalisierter Bewegungen zwischen beliebigen Endpunkten andererseits vorgestellt. Durch den Einsatz einesiterativen Segmentierungs- und Klassifizierungsalgorithmus sowie eines gemeinsamen probabilistischen Aktionsmodells werden dabei systematische Segmentierungsfehler vermieden. Darauf aufbauend werden Lernverfahren vorgestellt, die Bestärkendes Lernen und Lernen aus Demonstrationen kombinieren, um Robotern das Lösen komplexer Aufgaben durch eine gezielte Kombination einfacher Fähigkeiten beizubringen. Dabei werden zunächst sequentielle Aufgaben betrachtet, bei denen die heterogene Zusammensetzung des Zustands- und Aktionsraumes sowie die variable Länge der zu lernenden Aktionssequenzen besondere Herausforderungen darstellen. Diesen begegnet der daraufhin vorgestellte Ansatz durch eine probabilistische Approximation der Nutzenfunktion über Zustands- und Aktionspaare mit einem speziell entwickelten, kombinierten Kernel. Diese Approximation liefert die Grundlage für eine Bayessche Explorationsstrategie, die auf der Optimierung der Erwarteten Veränderung basiert und ein effizientes Bestärkendes Lernen ermöglicht. Um eine bestmögliche Integration des Bestärkenden Lernens mit Expertenwissen aus Demonstrationen zu erreichen, wird ein mehrstufiges Entscheidungssystem genutzt, das in jeder Situation bestimmt, welches der beiden Lernmodule das geeignetere ist und so ein sicheres, aber gleichzeitig auch effizientes Lernen von Bewegungssequenzen ermöglicht. Um auch komplexe Aufgaben effizient lösen zu können, wird zu guter Letzt ein hierarchisches Lernverfahren vorgestellt, das durch Nutzung von Abstraktionsmöglichkeiten eine verbesserte Skalierbarkeit bietet. Dabei wird die MAXQ-Methode für hierarchisches Bestärkendes Lernen für die Nutzung in kontinuierlichen Zustandsräumen erweitert. Mittels einer Gauß-Prozess-Approximation der MAXQ-Zerlegung für jede Teilaufgabe werden dabei rekursiv probabilistische Schätzungen der Q-Werte entlang der Aufgabenhierarchie berechnet. Auf diese Weise kann das bereits erfolgreich zum Lernen von Aktionssequenzen eingesetzte Bayessche Explorationskriterium auch zum effizienten Lernen von MAXQ-Hierarchien angewandt werden.Darüber hinaus nutzt das Verfahren die hierarchische Aufgabenstruktur, um gezielt Demonstrationen nur für Aufgabenteile anfordern werden, in denen diese tatsächlich benötigt werden und somit unnötige redundante Demonstrationen zu vermeiden. Die vorgestellten Verfahrenwurden durch Experimente in einer simulierten Umgebung und auf einem humanoiden Roboter evaluiert
Policy space abstraction for a lifelong learning agent
This thesis is concerned with policy space abstractions that concisely encode alternative
ways of making decisions; dealing with discovery, learning, adaptation and use of these
abstractions. This work is motivated by the problem faced by autonomous agents that
operate within a domain for long periods of time, hence having to learn to solve many
different task instances that share some structural attributes. An example of such a
domain is an autonomous robot in a dynamic domestic environment. Such environments
raise the need for transfer of knowledge, so as to eliminate the need for long learning
trials after deployment.
Typically, these tasks would be modelled as sequential decision making problems,
including path optimisation for navigation tasks, or Markov Decision Process models for
more general tasks. Learning within such models often takes the form of online learning
or reinforcement learning. However, handling issues such as knowledge transfer and
multiple task instances requires notions of structure and hierarchy, and that raises several
questions that form the topic of this thesis – (a) can an agent acquire such hierarchies in
policies in an online, incremental manner, (b) can we devise mathematically rigorous
ways to abstract policies based on qualitative attributes, (c) when it is inconvenient to
employ prolonged trial and error learning, can we devise alternate algorithmic methods
for decision making in a lifelong setting?
The first contribution of this thesis is an algorithmic method for incrementally
acquiring hierarchical policies. Working with the framework of options - temporally
extended actions - in reinforcement learning, we present a method for discovering
persistent subtasks that define useful options for a particular domain. Our algorithm
builds on a probabilistic mixture model in state space to define a generalised and
persistent form of ‘bottlenecks’, and suggests suitable policy fragments to make options.
In order to continuously update this hierarchy, we devise an incremental process which
runs in the background and takes care of proposing and forgetting options. We evaluate
this framework in simulated worlds, including the RoboCup 2D simulation league
domain.
The second contribution of this thesis is in defining abstractions in terms of equivalence
classes of trajectories. Utilising recently developed techniques from computational
topology, in particular the concept of persistent homology, we show that a library of
feasible trajectories could be retracted to representative paths that may be sufficient for
reasoning about plans at the abstract level. We present a complete framework, starting
from a novel construction of a simplicial complex that describes higher-order connectivity
properties of a spatial domain, to methods for computing the homology of this
complex at varying resolutions. The resulting abstractions are motion primitives that
may be used as topological options, contributing a novel criterion for option discovery.
This is validated by experiments in simulated 2D robot navigation, and in manipulation
using a physical robot platform.
Finally, we develop techniques for solving a family of related, but different, problem
instances through policy reuse of a finite policy library acquired over the agent’s lifetime.
This represents an alternative approach when traditional methods such as hierarchical
reinforcement learning are not computationally feasible. We abstract the policy space
using a non-parametric model of performance of policies in multiple task instances, so
that decision making is posed as a Bayesian choice regarding what to reuse. This is
one approach to transfer learning that is motivated by the needs of practical long-lived
systems. We show the merits of such Bayesian policy reuse in simulated real-time
interactive systems, including online personalisation and surveillance
- …