Search CORE

82 research outputs found

Policy space abstraction for a lifelong learning agent

Author: Hawasly Majd
Publication venue: The University of Edinburgh
Publication date: 27/11/2014
Field of study

This thesis is concerned with policy space abstractions that concisely encode alternative ways of making decisions; dealing with discovery, learning, adaptation and use of these abstractions. This work is motivated by the problem faced by autonomous agents that operate within a domain for long periods of time, hence having to learn to solve many different task instances that share some structural attributes. An example of such a domain is an autonomous robot in a dynamic domestic environment. Such environments raise the need for transfer of knowledge, so as to eliminate the need for long learning trials after deployment. Typically, these tasks would be modelled as sequential decision making problems, including path optimisation for navigation tasks, or Markov Decision Process models for more general tasks. Learning within such models often takes the form of online learning or reinforcement learning. However, handling issues such as knowledge transfer and multiple task instances requires notions of structure and hierarchy, and that raises several questions that form the topic of this thesis – (a) can an agent acquire such hierarchies in policies in an online, incremental manner, (b) can we devise mathematically rigorous ways to abstract policies based on qualitative attributes, (c) when it is inconvenient to employ prolonged trial and error learning, can we devise alternate algorithmic methods for decision making in a lifelong setting? The first contribution of this thesis is an algorithmic method for incrementally acquiring hierarchical policies. Working with the framework of options - temporally extended actions - in reinforcement learning, we present a method for discovering persistent subtasks that define useful options for a particular domain. Our algorithm builds on a probabilistic mixture model in state space to define a generalised and persistent form of ‘bottlenecks’, and suggests suitable policy fragments to make options. In order to continuously update this hierarchy, we devise an incremental process which runs in the background and takes care of proposing and forgetting options. We evaluate this framework in simulated worlds, including the RoboCup 2D simulation league domain. The second contribution of this thesis is in defining abstractions in terms of equivalence classes of trajectories. Utilising recently developed techniques from computational topology, in particular the concept of persistent homology, we show that a library of feasible trajectories could be retracted to representative paths that may be sufficient for reasoning about plans at the abstract level. We present a complete framework, starting from a novel construction of a simplicial complex that describes higher-order connectivity properties of a spatial domain, to methods for computing the homology of this complex at varying resolutions. The resulting abstractions are motion primitives that may be used as topological options, contributing a novel criterion for option discovery. This is validated by experiments in simulated 2D robot navigation, and in manipulation using a physical robot platform. Finally, we develop techniques for solving a family of related, but different, problem instances through policy reuse of a finite policy library acquired over the agent’s lifetime. This represents an alternative approach when traditional methods such as hierarchical reinforcement learning are not computationally feasible. We abstract the policy space using a non-parametric model of performance of policies in multiple task instances, so that decision making is posed as a Bayesian choice regarding what to reuse. This is one approach to transfer learning that is motivated by the needs of practical long-lived systems. We show the merits of such Bayesian policy reuse in simulated real-time interactive systems, including online personalisation and surveillance

Edinburgh Research Archive

Behavioral Hierarchy: Exploration and Representation

Author: A. G. Barto
A. G. Barto
A. Jonsson
A. Jonsson
A. McGovern
A. Newell
B. Bakker
B. C. Silva da
B. Digney
B. Hengst
C. Boutilier
C. Guestrin
D. A. Waterman
D. Heckerman
D. W. Schneider
E. D. Sacerdoti
G. A. Miller
G. J. Tesauro
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
H. A. Simon
H. A. Simon
H. Seijen van
H. Steck
I. Menache
J. Gibson
J. Mugan
J. Pearl
J. R. Anderson
J. Schmidhuber
K. Murphy
K. S. Lashley
L. Torrey
M. E. Taylor
M. E. Taylor
M. Huber
M. M. Botvinick
M. M. Botvinick
M. Pickett
N. Friedman
N. Mehta
P. Langley
R. Alur
R. E. Bellman
R. E. Fikes
R. E. Korf
R. M. Ryan
R. Parr
R. R. Burridge
R. S. Sutton
R. S. Sutton
R. Tedrake
R. Tedrake
R. W. White
S. B. Thrun
S. Hart
S. Mahadevan
S. Mannor
S. Singh
S. Tong
T. G. Dietterich
T. G. Dietterich
T. L. Dean
W. Buntine
W. Callebaut
Y. Liu
Ö. Şimşek
Ö. Şimşek
Ö. Şimşek
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Efficient Learning and Inference for High-dimensional Lagrangian Systems

Author: Vernaza Paul N
Publication venue: ScholarlyCommons
Publication date: 01/01/2011
Field of study

Learning the nature of a physical system is a problem that presents many challenges and opportunities owing to the unique structure associated with such systems. Many physical systems of practical interest in engineering are high-dimensional, which prohibits the application of standard learning methods to such problems. This first part of this work proposes therefore to solve learning problems associated with physical systems by identifying their low-dimensional Lagrangian structure. Algorithms are given to learn this structure in the case that it is obscured by a change of coordinates. The associated inference problem corresponds to solving a high-dimensional minimum-cost path problem, which can be solved by exploiting the symmetry of the problem. These techniques are demonstrated via an application to learning from high-dimensional human motion capture data. The second part of this work is concerned with the application of these methods to high-dimensional motion planning. Algorithms are given to learn and exploit the struc- ture of holonomic motion planning problems effectively via spectral analysis and iterative dynamic programming, admitting solutions to problems of unprecedented dimension com- pared to known methods for optimal motion planning. The quality of solutions found is also demonstrated to be much superior in practice to those obtained via sampling-based planning and smoothing, in both simulated problems and experiments with a robot arm. This work therefore provides strong validation of the idea that learning low-dimensional structure is the key to future advances in this field

ScholarlyCommons@Penn

Workshop on Rich Representations for Reinforcement Learning:Held in conjunction with the 22nd International Conference on Machine Learning, August 7, 2005, Bonn, Germany

Author: Driessens Kurt
Fern Alan
van Otterlo Martijn
Publication venue: University of Bonn
Publication date: 01/01/2005
Field of study

University of Twente Research Information

Recommended from our members

Hierarchical structure discovery and transfer in sequential decision problems

Author: Mehta Neville
Publication venue: 'Oregon State University'
Publication date
Field of study

Acting intelligently to efficiently solve sequential decision problems requires the ability to extract hierarchical structure from the underlying domain dynamics, exploit it for optimal or near-optimal decision-making, and transfer it to related problems instead of solving every problem in isolation. This dissertation makes three contributions toward this goal. The first contribution is the introduction of two frameworks for the transfer of hierarchical structure in sequential decision problems. The MASH framework facilitates transfer among multiple agents coordinating within a domain. The VRHRL framework allows an agent to transfer its knowledge across a family of domains that share the same transition dynamics but have differing reward dynamics. Both MASH and VRHRL are validated empirically in large domains and the results demonstrate significant speedup in the solutions due to transfer. The second contribution is a new approach to the discovery of hierarchical structure in sequential decision problems. HI-MAT leverages action models to analyze the relevant dependencies in a hierarchically-generated trajectory and it discovers hierarchical structure that transfers to all problems whose actions share the same relevant dependencies as the single source problem. HierGen advances HI-MAT by learning simple action models, leveraging these models to analyze non-hierarchically-generated trajectories from multiple source problems in a robust causal fashion, and discovering hierarchical structure that transfers to all problems whose actions share the same causal dependencies as those in the source problems. Empirical evaluations in multiple domains demonstrate that the discovered hierarchical structures are comparable to manually-designed structures in quality and performance. Action models are essential to hierarchical structure discovery and other aspects of intelligent behavior. The third contribution of this dissertation is the introduction of two general frameworks for learning action models in sequential decision problems. In the MBP framework, learning is user-driven; in the PLEX framework, the learner generates its own problems. The frameworks are formally analyzed and reduced to concept learning with one-sided error. A general action-modeling language is shown to be efficiently learnable in both frameworks

ScholarsArchive@OSU

Recommended from our members

Curriculum learning in reinforcement learning

Author: Narvekar Sanmit Santosh
Publication venue
Publication date: 21/07/2021
Field of study

In recent years, reinforcement learning (RL) has been increasingly successful at solving complex tasks. Despite these successes, one of the fundamental challenges is that many RL methods require large amounts of experience, and thus can be slow to train in practice. Transfer learning is a recent area of research that has been shown to speed up learning on a complex task by transferring knowledge from one or more easier source tasks. Most existing transfer learning methods treat this transfer of knowledge as a one-step process, where knowledge from all the sources are directly transferred to the target. However, for complex tasks, it may be more beneficial (and even necessary) to gradually acquire skills over multiple tasks in sequence, where each subsequent task requires and builds upon knowledge gained in a previous task. This idea is pervasive throughout human learning, where people learn complex skills gradually by training via a curriculum. The goal of this thesis is to explore whether autonomous reinforcement learning agents can also benefit by training via a curriculum, and whether such curricula can be designed fully autonomously. In order to answer these questions, this thesis first formalizes the concept of a curriculum, and the methodology of curriculum learning in reinforcement learning. Curriculum learning consists of 3 main elements: 1) task generation, which creates a suitable set of source tasks; 2) sequencing, which focuses on how to order these tasks into a curriculum; and 3) transfer learning, which considers how to transfer knowledge between tasks in the curriculum. This thesis introduces several methods to both create suitable source tasks and automatically sequence them into a curriculum. We show that these methods produce curricula that are tailored to the individual sensing and action capabilities of different agents, and show how the curricula learned can be adapted for new, but related target tasks. Together, these methods form the components of an autonomous curriculum design agent, that can suggest a training curriculum customized to both the unique abilities of each agent and the task in question. We expect this research on the curriculum learning approach will increase the applicability and scalability of RL methods by providing a faster way of training reinforcement learning agents, compared to learning tabula rasa.Computer Science

Texas ScholarWorks

A Survey on Causal Reinforcement Learning

Author: Cai Ruichu
Hao Zhifeng
Huang Libo
Sun Fuchun
Zeng Yan
Publication venue
Publication date: 01/06/2023
Field of study

While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.Comment: 29 pages, 20 figure

arXiv.org e-Print Archive

Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models

Author: Ahilan Sanjeevan
Publication venue: UCL (University College London)
Publication date: 28/05/2021
Field of study

This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidenc

UCL Discovery

Design of an UAV swarm

Author: Graells Pina Eduard
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/04/2022
Field of study

This master thesis tries to give an overview on the general aspects involved in the design of an UAV swarm. UAV swarms are continuoulsy gaining popularity amongst researchers and UAV manufacturers, since they allow greater success rates in task accomplishing with reduced times. Appart from this, multiple UAVs cooperating between them opens a new field of missions that can only be carried in this way. All the topics explained within this master thesis will explain all the agents involved in the design of an UAV swarm, from the communication protocols between them, navigation and trajectory analysis and task allocation

UPCommons. Portal del coneixement obert de la UPC