Search CORE

5,430 research outputs found

Recommended from our members

End-to-end deep reinforcement learning in computer systems

Author: Schaarschmidt Michael
Publication venue: University of Cambridge
Publication date: 13/04/2020
Field of study

Abstract The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications. In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping. Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications

Apollo (Cambridge)

Modular Design Patterns for Hybrid Learning and Reasoning Systems: a taxonomy, patterns and use cases

Author: de Boer Maaike
Meyer-Vitali André
Teije Annette ten
van Bekkum Michael
van Harmelen Frank
Publication venue
Publication date: 25/03/2021
Field of study

The unification of statistical (data-driven) and symbolic (knowledge-driven) methods is widely recognised as one of the key challenges of modern AI. Recent years have seen large number of publications on such hybrid neuro-symbolic AI systems. That rapidly growing literature is highly diverse and mostly empirical, and is lacking a unifying view of the large variety of these hybrid systems. In this paper we analyse a large body of recent literature and we propose a set of modular design patterns for such hybrid, neuro-symbolic systems. We are able to describe the architecture of a very large number of hybrid systems by composing only a small set of elementary patterns as building blocks. The main contributions of this paper are: 1) a taxonomically organised vocabulary to describe both processes and data structures used in hybrid systems; 2) a set of 15+ design patterns for hybrid AI systems, organised in a set of elementary patterns and a set of compositional patterns; 3) an application of these design patterns in two realistic use-cases for hybrid AI systems. Our patterns reveal similarities between systems that were not recognised until now. Finally, our design patterns extend and refine Kautz' earlier attempt at categorising neuro-symbolic architectures.Comment: 20 pages, 22 figures, accepted for publication in the International Journal of Applied Intelligenc

arXiv.org e-Print Archive

VU Research Portal

Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Author: Chen Zizhao
Icarte Rodrigo Toro
Klassen Toryn Q.
Li Andrew C.
McIlraith Sheila A.
Vaezipoor Pashootan
Publication venue
Publication date: 23/11/2022
Field of study

Natural and formal languages provide an effective mechanism for humans to specify instructions and reward functions. We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines, an increasingly popular automaton-inspired structure. We are interested in the case where the mapping of environment state to a symbolic (here, Reward Machine) vocabulary -- commonly known as the labelling function -- is uncertain from the perspective of the agent. We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions as a special class of POMDP optimization problem, and investigate several methods to address the problem, building on existing and new techniques, the latter focused on predicting Reward Machine state, rather than on grounding of individual symbols. We analyze these methods and evaluate them experimentally under varying degrees of uncertainty in the correct interpretation of the symbolic vocabulary. We verify the strength of our approach and the limitation of existing methods via an empirical investigation on both illustrative, toy domains and partially observable, deep RL domains.Comment: NeurIPS Deep Reinforcement Learning Workshop 202

arXiv.org e-Print Archive

Enablingmarkovian representations under imperfect information

Author: Belardinelli F
Leon BG
Malvone V
Publication venue: 'Scitepress'
Publication date: 01/02/2022
Field of study

Markovian systems are widely used in reinforcement learning (RL), when the successful completion of a task depends exclusively on the last interaction between an autonomous agent and its environment. Unfortunately, real-world instructions are typically complex and often better described as non-Markovian. In this paper we present an extension method that allows solving partially-observable non-Markovian reward decision processes (PONMRDPs) by solving equivalent Markovian models. This potentially facilitates Markovian-based state-of-the-art techniques, including RL, to find optimal behaviours for problems best described as PONMRDP. We provide formal optimality guarantees of our extension methods together with a counterexample illustrating that naive extensions from existing techniques in fully-observable environments cannot provide such guarantees

HAL Evry

HAL Descartes

Spiral - Imperial College Digital Repository

Hal-Diderot

Grounding Spatio-Temporal Language with Transformers

Author: Hofmann Katja
Karch Tristan
Moulin-Frier Clément
Oudeyer Pierre-Yves
Teodorescu Laetitia
Publication venue
Publication date: 16/06/2021
Field of study

Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.Comment: Contains main article and supplementarie

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

Author: Arulkumaran Kai
Behbahani Feryal
Bharath Anil Anthony
Dai Tianhong
Gerbert Tamara
Tukra Samyakh
Publication venue: 'Elsevier BV'
Publication date: 07/07/2022
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Reinforcement Learning for Argumentation

Author: ALAHMARI SULTAN
Publication venue: University of York
Publication date: 01/10/2019
Field of study

Argumentation as a logical reasoning approach plays an important role in improving communication, increasing agree-ability, and resolving conflicts in multi-agent-systems (MAS). The present research aims to explore the effectiveness of argumentation in reinforcement learning of intelligent agents in terms of, outperforming baseline agents, learning transfer between argument graphs, and improving relevance and coherence of dialogue quality. This research developed `ARGUMENTO+' to encourage a reinforcement learning agent (RL agent) playing abstract argument game for improving performance against different baseline agents by using a newly proposed state representation in order to make each state unique. When attempting to generalise this approach to other argumentation graphs, the RL agent was not able to effectively identify the argument patterns that are transferable to other domains. In order to improve the effectiveness of the RL agent to recognise argument patterns, this research adopted a logic-based dialogue game approach with richer argument representations. In the DE dialogue game, the RL agent played against hard-coded heuristic agents and showed improved performance compared to the baseline agents by using a reward function that encourages the RL agent to win the game with minimum number of moves. This also allowed the RL agent to adopt its own strategy, make moves, and learn to argue. This thesis also presents a new reward function that makes the RL agent's dialogue more coherent and relevant than its opponents. The RL agent was designed to recognise argument patterns, i.e. argumentation schemes and evidence support sources, which can be related to different domains. The RL agent used a transfer learning method to generalise and transfer experiences and speed up learning

White Rose E-theses Online

Sample efficiency, transfer learning and interpretability for deep reinforcement learning

Author: Arulkumaran Kailash
Publication venue: Bioengineering, Imperial College London
Publication date: 01/06/2020
Field of study

Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces

Spiral - Imperial College Digital Repository