657 research outputs found
Adversarial recovery of agent rewards from latent spaces of the limit order book
Inverse reinforcement learning has proved its ability to explain state-action
trajectories of expert agents by recovering their underlying reward functions
in increasingly challenging environments. Recent advances in adversarial
learning have allowed extending inverse RL to applications with non-stationary
environment dynamics unknown to the agents, arbitrary structures of reward
functions and improved handling of the ambiguities inherent to the ill-posed
nature of inverse RL. This is particularly relevant in real time applications
on stochastic environments involving risk, like volatile financial markets.
Moreover, recent work on simulation of complex environments enable learning
algorithms to engage with real market data through simulations of its latent
space representations, avoiding a costly exploration of the original
environment. In this paper, we explore whether adversarial inverse RL
algorithms can be adapted and trained within such latent space simulations from
real market data, while maintaining their ability to recover agent rewards
robust to variations in the underlying dynamics, and transfer them to new
regimes of the original environment.Comment: Published as a workshop paper on NeurIPS 2019 Workshop on Robust AI
in Financial Services. 33rd Conference on Neural Information Processing
Systems (NeurIPS 2019), Vancouver, Canad
CnGAN: Generative Adversarial Networks for Cross-network user preference generation for non-overlapped users
A major drawback of cross-network recommender solutions is that they can only
be applied to users that are overlapped across networks. Thus, the
non-overlapped users, which form the majority of users are ignored. As a
solution, we propose CnGAN, a novel multi-task learning based,
encoder-GAN-recommender architecture. The proposed model synthetically
generates source network user preferences for non-overlapped users by learning
the mapping from target to source network preference manifolds. The resultant
user preferences are used in a Siamese network based neural recommender
architecture. Furthermore, we propose a novel user based pairwise loss function
for recommendations using implicit interactions to better guide the generation
process in the multi-task learning environment.We illustrate our solution by
generating user preferences on the Twitter source network for recommendations
on the YouTube target network. Extensive experiments show that the generated
preferences can be used to improve recommendations for non-overlapped users.
The resultant recommendations achieve superior performance compared to the
state-of-the-art cross-network recommender solutions in terms of accuracy,
novelty and diversity
- …