82 research outputs found
Policy space abstraction for a lifelong learning agent
This thesis is concerned with policy space abstractions that concisely encode alternative
ways of making decisions; dealing with discovery, learning, adaptation and use of these
abstractions. This work is motivated by the problem faced by autonomous agents that
operate within a domain for long periods of time, hence having to learn to solve many
different task instances that share some structural attributes. An example of such a
domain is an autonomous robot in a dynamic domestic environment. Such environments
raise the need for transfer of knowledge, so as to eliminate the need for long learning
trials after deployment.
Typically, these tasks would be modelled as sequential decision making problems,
including path optimisation for navigation tasks, or Markov Decision Process models for
more general tasks. Learning within such models often takes the form of online learning
or reinforcement learning. However, handling issues such as knowledge transfer and
multiple task instances requires notions of structure and hierarchy, and that raises several
questions that form the topic of this thesis – (a) can an agent acquire such hierarchies in
policies in an online, incremental manner, (b) can we devise mathematically rigorous
ways to abstract policies based on qualitative attributes, (c) when it is inconvenient to
employ prolonged trial and error learning, can we devise alternate algorithmic methods
for decision making in a lifelong setting?
The first contribution of this thesis is an algorithmic method for incrementally
acquiring hierarchical policies. Working with the framework of options - temporally
extended actions - in reinforcement learning, we present a method for discovering
persistent subtasks that define useful options for a particular domain. Our algorithm
builds on a probabilistic mixture model in state space to define a generalised and
persistent form of ‘bottlenecks’, and suggests suitable policy fragments to make options.
In order to continuously update this hierarchy, we devise an incremental process which
runs in the background and takes care of proposing and forgetting options. We evaluate
this framework in simulated worlds, including the RoboCup 2D simulation league
domain.
The second contribution of this thesis is in defining abstractions in terms of equivalence
classes of trajectories. Utilising recently developed techniques from computational
topology, in particular the concept of persistent homology, we show that a library of
feasible trajectories could be retracted to representative paths that may be sufficient for
reasoning about plans at the abstract level. We present a complete framework, starting
from a novel construction of a simplicial complex that describes higher-order connectivity
properties of a spatial domain, to methods for computing the homology of this
complex at varying resolutions. The resulting abstractions are motion primitives that
may be used as topological options, contributing a novel criterion for option discovery.
This is validated by experiments in simulated 2D robot navigation, and in manipulation
using a physical robot platform.
Finally, we develop techniques for solving a family of related, but different, problem
instances through policy reuse of a finite policy library acquired over the agent’s lifetime.
This represents an alternative approach when traditional methods such as hierarchical
reinforcement learning are not computationally feasible. We abstract the policy space
using a non-parametric model of performance of policies in multiple task instances, so
that decision making is posed as a Bayesian choice regarding what to reuse. This is
one approach to transfer learning that is motivated by the needs of practical long-lived
systems. We show the merits of such Bayesian policy reuse in simulated real-time
interactive systems, including online personalisation and surveillance
Efficient Learning and Inference for High-dimensional Lagrangian Systems
Learning the nature of a physical system is a problem that presents many challenges and opportunities owing to the unique structure associated with such systems. Many physical systems of practical interest in engineering are high-dimensional, which prohibits the application of standard learning methods to such problems. This first part of this work proposes therefore to solve learning problems associated with physical systems by identifying their low-dimensional Lagrangian structure. Algorithms are given to learn this structure in the case that it is obscured by a change of coordinates. The associated inference problem corresponds to solving a high-dimensional minimum-cost path problem, which can be solved by exploiting the symmetry of the problem. These techniques are demonstrated via an application to learning from high-dimensional human motion capture data. The second part of this work is concerned with the application of these methods to high-dimensional motion planning. Algorithms are given to learn and exploit the struc- ture of holonomic motion planning problems effectively via spectral analysis and iterative dynamic programming, admitting solutions to problems of unprecedented dimension com- pared to known methods for optimal motion planning. The quality of solutions found is also demonstrated to be much superior in practice to those obtained via sampling-based planning and smoothing, in both simulated problems and experiments with a robot arm. This work therefore provides strong validation of the idea that learning low-dimensional structure is the key to future advances in this field
Recommended from our members
Hierarchical structure discovery and transfer in sequential decision problems
Acting intelligently to efficiently solve sequential decision problems requires the ability to extract hierarchical structure from the underlying domain dynamics, exploit it for optimal or near-optimal decision-making, and transfer it to related problems instead of solving every problem in isolation. This dissertation makes three contributions toward this goal.
The first contribution is the introduction of two frameworks for the transfer of hierarchical structure in sequential decision problems. The MASH framework facilitates transfer among multiple agents coordinating within a domain. The VRHRL framework allows an agent to transfer its knowledge across a family of domains that share the same transition dynamics but have differing reward dynamics. Both MASH and VRHRL are validated empirically in large domains and the results demonstrate significant speedup in the solutions due to transfer.
The second contribution is a new approach to the discovery of hierarchical structure in sequential decision problems. HI-MAT leverages action models to analyze the relevant dependencies in a hierarchically-generated trajectory and it discovers hierarchical structure that transfers to all problems whose actions share the same relevant dependencies as the single source problem. HierGen advances HI-MAT by learning simple action models, leveraging these models to analyze non-hierarchically-generated trajectories from multiple source problems in a robust causal fashion, and discovering hierarchical structure that transfers to all problems whose actions share the same causal dependencies as those in the source problems. Empirical evaluations in multiple domains demonstrate that the discovered hierarchical structures are comparable to manually-designed structures in quality and performance.
Action models are essential to hierarchical structure discovery and other aspects of intelligent behavior. The third contribution of this dissertation is the introduction of two general frameworks for learning action models in sequential decision problems. In the MBP framework, learning is user-driven; in the PLEX framework, the learner generates its own problems. The frameworks are formally analyzed and reduced to concept learning with one-sided error. A general action-modeling language is shown to be efficiently learnable in both frameworks
Recommended from our members
Curriculum learning in reinforcement learning
In recent years, reinforcement learning (RL) has been increasingly successful at solving complex tasks. Despite these successes, one of the fundamental challenges is that many RL methods require large amounts of experience, and thus can be slow to train in practice. Transfer learning is a recent area of research that has been shown to speed up learning on a complex task by transferring knowledge from one or more easier source tasks. Most existing transfer learning methods treat this transfer of knowledge as a one-step process, where knowledge from all the sources are directly transferred to the target. However, for complex tasks, it may be more beneficial (and even necessary) to gradually acquire skills over multiple tasks in sequence, where each subsequent task requires and builds upon knowledge gained in a previous task. This idea is pervasive throughout human learning, where people learn complex skills gradually by training via a curriculum.
The goal of this thesis is to explore whether autonomous reinforcement learning agents can also benefit by training via a curriculum, and whether such curricula can be designed fully autonomously. In order to answer these questions, this thesis first formalizes the concept of a curriculum, and the methodology of curriculum learning in reinforcement learning. Curriculum learning consists of 3 main elements: 1) task generation, which creates a suitable set of source tasks; 2) sequencing, which focuses on how to order these tasks into a curriculum; and 3) transfer learning, which considers how to transfer knowledge between tasks in the curriculum. This thesis introduces several methods to both create suitable source tasks and automatically sequence them into a curriculum. We show that these methods produce curricula that are tailored to the individual sensing and action capabilities of different agents, and show how the curricula learned can be adapted for new, but related target tasks. Together, these methods form the components of an autonomous curriculum design agent, that can suggest a training curriculum customized to both the unique abilities of each agent and the task in question. We expect this research on the curriculum learning approach will increase the applicability and scalability of RL methods by providing a faster way of training reinforcement learning agents, compared to learning tabula rasa.Computer Science
A Survey on Causal Reinforcement Learning
While Reinforcement Learning (RL) achieves tremendous success in sequential
decision-making problems of many domains, it still faces key challenges of data
inefficiency and the lack of interpretability. Interestingly, many researchers
have leveraged insights from the causality literature recently, bringing forth
flourishing works to unify the merits of causality and address well the
challenges from RL. As such, it is of great necessity and significance to
collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL
methods, and investigate the potential functionality from causality toward RL.
In particular, we divide existing CRL approaches into two categories according
to whether their causality-based information is given in advance or not. We
further analyze each category in terms of the formalization of different
models, ranging from the Markov Decision Process (MDP), Partially Observed
Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment
Regime (DTR). Moreover, we summarize the evaluation matrices and open sources
while we discuss emerging applications, along with promising prospects for the
future development of CRL.Comment: 29 pages, 20 figure
Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models
This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidenc
Design of an UAV swarm
This master thesis tries to give an overview on the general aspects involved in the design of an UAV swarm. UAV swarms are continuoulsy gaining popularity amongst researchers and UAV manufacturers, since they allow greater success rates in task accomplishing with reduced times. Appart from this, multiple UAVs cooperating between them opens a new field of missions that can only be carried in this way. All the topics explained within this master thesis will explain all the agents involved in the design of an UAV swarm, from the communication protocols between them, navigation and trajectory analysis and task allocation
- …