2 research outputs found
Compositional generalization in multi-armed bandits
To what extent do human reward learning and decision-making rely on the ability to represent and generate richly structured relationships between options? We provide evidence that structure learning and the principle of compositionality play crucial roles in human reinforcement learning. In a new multi-armed bandit paradigm, we found evidence that participants are able to learn representations of different reward structures and combine them to make correct generalizations about options in novel contexts. Moreover, we found substantial evidence that participants transferred knowledge of simpler reward structures to make compositional generalizations about rewards in complex contexts. This allowed participants to accumulate more rewards earlier, and to explore less whenever such knowledge transfer was possible. We also provide a computational model which is able to generalize and compose knowledge for complex reward structures. This model describes participant behaviour in the compositional generalization task better than various other models of decision-making and transfer learning
Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support
Universal probabilistic programming systems (PPSs) provide a powerful
framework for specifying rich probabilistic models. They further attempt to
automate the process of drawing inferences from these models, but doing this
successfully is severely hampered by the wide range of non--standard models
they can express. As a result, although one can specify complex models in a
universal PPS, the provided inference engines often fall far short of what is
required. In particular, we show that they produce surprisingly unsatisfactory
performance for models where the support varies between executions, often doing
no better than importance sampling from the prior. To address this, we
introduce a new inference framework: Divide, Conquer, and Combine, which
remains efficient for such models, and show how it can be implemented as an
automated and generic PPS inference engine. We empirically demonstrate
substantial performance improvements over existing approaches on three
examples.Comment: Published at the 37th International Conference on Machine Learning
(ICML 2020