Search CORE

191 research outputs found

Concept-modulated model-based offline reinforcement learning for rapid generalization

Author: Ketz Nicholas A.
Pilly Praveen K.
Publication venue
Publication date: 07/09/2022
Field of study

The robustness of any machine learning solution is fundamentally bound by the data it was trained on. One way to generalize beyond the original training is through human-informed augmentation of the original dataset; however, it is impossible to specify all possible failure cases that can occur during deployment. To address this limitation we combine model-based reinforcement learning and model-interpretability methods to propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We demonstrate this method within a standard realistic driving simulator in a simple point-to-point navigation task, where we show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches

arXiv.org e-Print Archive

Adversarial recovery of agent rewards from latent spaces of the limit order book

Author: Gal Yarin
Mison Virgile
Roa-Vicens Jacobo
Silva Ricardo
Wang Yuanbo
Publication venue
Publication date: 09/12/2019
Field of study

Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and improved handling of the ambiguities inherent to the ill-posed nature of inverse RL. This is particularly relevant in real time applications on stochastic environments involving risk, like volatile financial markets. Moreover, recent work on simulation of complex environments enable learning algorithms to engage with real market data through simulations of its latent space representations, avoiding a costly exploration of the original environment. In this paper, we explore whether adversarial inverse RL algorithms can be adapted and trained within such latent space simulations from real market data, while maintaining their ability to recover agent rewards robust to variations in the underlying dynamics, and transfer them to new regimes of the original environment.Comment: Published as a workshop paper on NeurIPS 2019 Workshop on Robust AI in Financial Services. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

arXiv.org e-Print Archive

UCL Discovery

Archivo Digital UPM