85,223 research outputs found
Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks
Children are capable of acquiring a large repertoire of motor skills and of efficiently adapting them to novel conditions. In a previous work we proposed a hierarchical modular reinforcement learning model (RANK) that can learn multiple motor skills in continuous action and state spaces. The model is based on a development of the mixture-of-expert model that has been suitably developed to work with reinforcement learning. In particular, the model uses a high-level gating network for assigning responsibilities for acting and for learning to a set of low-level expert networks. The model was also developed with the goal of exploiting the Piagetian mechanisms of assimilation and accommodation to support learning of multiple tasks. This paper proposes a new model (TERL - Transfer Expert Reinforcement Learning) that substantially improves RANK. The key difference with respect to the previous model is the decoupling of the mechanisms that generate the responsibility signals of experts for learning and for control. This made possible to satisfy different constraints for functioning and for learning. We test both the TERL and the RANK models with a two-DOFs dynamic arm engaged in solving multiple reaching tasks, and compare the two with a simple, flat reinforcement learning model. The results show that both models are capable of exploiting assimilation and accommodation processes in order to transfer knowledge between similar tasks, and at the same time to avoid catastrophic interference. Furthermore, the TERL model is shown to significantly outperform the RANK model thanks to its faster and more stable specialization of experts
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Off-policy Learning to Rank (LTR) aims to optimize a ranker from data
collected by a deployed logging policy. However, existing off-policy learning
to rank methods often make strong assumptions about how users generate the
click data, i.e., the click model, and hence need to tailor their methods
specifically under different click models. In this paper, we unified the
ranking process under general stochastic click models as a Markov Decision
Process (MDP), and the optimal ranking could be learned with offline
reinforcement learning (RL) directly. Building upon this, we leverage offline
RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified
Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a
wide range of click models. Through a dedicated formulation of the MDP, we show
that offline RL algorithms can adapt to various click models without complex
debiasing techniques and prior knowledge of the model. Results on various
large-scale datasets demonstrate that CUOLR consistently outperforms the
state-of-the-art off-policy learning to rank algorithms while maintaining
consistency and robustness under different click models
- …