Search CORE

2 research outputs found

Compositional generalization in multi-armed bandits

Author: Eric S
Saanum T
Speekenbrink M
Publication venue: 43rd Annual Meeting of the Cognitive Science Society
Publication date: 01/07/2021
Field of study

To what extent do human reward learning and decision-making rely on the ability to represent and generate richly structured relationships between options? We provide evidence that structure learning and the principle of compositionality play crucial roles in human reinforcement learning. In a new multi-armed bandit paradigm, we found evidence that participants are able to learn representations of different reward structures and combine them to make correct generalizations about options in novel contexts. Moreover, we found substantial evidence that participants transferred knowledge of simpler reward structures to make compositional generalizations about rewards in complex contexts. This allowed participants to accumulate more rewards earlier, and to explore less whenever such knowledge transfer was possible. We also provide a computational model which is able to generalize and compose knowledge for complex reward structures. This model describes participant behaviour in the compositional generalization task better than various other models of decision-making and transfer learning

UCL Discovery

Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support

Author: Rainforth Tom
Teh Yee Whye
Yang Hongseok
Zhou Yuan
Publication venue
Publication date: 11/07/2020
Field of study

Universal probabilistic programming systems (PPSs) provide a powerful framework for specifying rich probabilistic models. They further attempt to automate the process of drawing inferences from these models, but doing this successfully is severely hampered by the wide range of non--standard models they can express. As a result, although one can specify complex models in a universal PPS, the provided inference engines often fall far short of what is required. In particular, we show that they produce surprisingly unsatisfactory performance for models where the support varies between executions, often doing no better than importance sampling from the prior. To address this, we introduce a new inference framework: Divide, Conquer, and Combine, which remains efficient for such models, and show how it can be implemented as an automated and generic PPS inference engine. We empirically demonstrate substantial performance improvements over existing approaches on three examples.Comment: Published at the 37th International Conference on Machine Learning (ICML 2020

arXiv.org e-Print Archive

Oxford University Research Archive