191 research outputs found
Concept-modulated model-based offline reinforcement learning for rapid generalization
The robustness of any machine learning solution is fundamentally bound by the
data it was trained on. One way to generalize beyond the original training is
through human-informed augmentation of the original dataset; however, it is
impossible to specify all possible failure cases that can occur during
deployment. To address this limitation we combine model-based reinforcement
learning and model-interpretability methods to propose a solution that
self-generates simulated scenarios constrained by environmental concepts and
dynamics learned in an unsupervised manner. In particular, an internal model of
the agent's environment is conditioned on low-dimensional concept
representations of the input space that are sensitive to the agent's actions.
We demonstrate this method within a standard realistic driving simulator in a
simple point-to-point navigation task, where we show dramatic improvements in
one-shot generalization to different instances of specified failure cases as
well as zero-shot generalization to similar variations compared to model-based
and model-free approaches
Adversarial recovery of agent rewards from latent spaces of the limit order book
Inverse reinforcement learning has proved its ability to explain state-action
trajectories of expert agents by recovering their underlying reward functions
in increasingly challenging environments. Recent advances in adversarial
learning have allowed extending inverse RL to applications with non-stationary
environment dynamics unknown to the agents, arbitrary structures of reward
functions and improved handling of the ambiguities inherent to the ill-posed
nature of inverse RL. This is particularly relevant in real time applications
on stochastic environments involving risk, like volatile financial markets.
Moreover, recent work on simulation of complex environments enable learning
algorithms to engage with real market data through simulations of its latent
space representations, avoiding a costly exploration of the original
environment. In this paper, we explore whether adversarial inverse RL
algorithms can be adapted and trained within such latent space simulations from
real market data, while maintaining their ability to recover agent rewards
robust to variations in the underlying dynamics, and transfer them to new
regimes of the original environment.Comment: Published as a workshop paper on NeurIPS 2019 Workshop on Robust AI
in Financial Services. 33rd Conference on Neural Information Processing
Systems (NeurIPS 2019), Vancouver, Canad
- …