7 research outputs found

    Receding Horizon Curiosity

    Get PDF
    Sample-efficient exploration is crucial not only for discovering rewarding experiences but also for adapting to environment changes in a task-agnostic fashion. A principled treatment of the problem of optimal input synthesis for system identification is provided within the framework of sequential Bayesian experimental design. In this paper, we present an effective trajectory-optimization-based approximate solution of this otherwise intractable problem that models optimal exploration in an unknown Markov decision process (MDP). By interleaving episodic exploration with Bayesian nonlinear system identification, our algorithm takes advantage of the inductive bias to explore in a directed manner, without assuming prior knowledge of the MDP. Empirical evaluations indicate a clear advantage of the proposed algorithm in terms of the rate of convergence and the final model fidelity when compared to intrinsic-motivation-based algorithms employing exploration bonuses such as prediction error and information gain. Moreover, our method maintains a computational advantage over a recent model-based active exploration (MAX) algorithm, by focusing on the information gain along trajectories instead of seeking a global exploration policy. A reference implementation of our algorithm and the conducted experiments is publicly available

    Towards a unifying theory of generalization

    Get PDF
    How do humans generalize from observed to unobserved data? How does generalization support inference, prediction, and decision making? I propose that a big part of human generalization can be explained by a powerful mechanism of function learning. I put forward and assess Gaussian Process regression as a model of human function learning that can unify several psychological theories of generalization. Across 14 experiments and using extensive computational modeling, I show that this model generates testable predictions about human preferences over different levels of complexity, provides a window into compositional inductive biases, and --combined with an optimistic yet efficient sampling strategy-- guides human decision making through complex spaces. Chapters 1 and 2 propose that, from a psychological and mathematical perspective, function learning and generalization are close kin. Chapter 3 derives and tests theoretical predictions of participants' preferences over differently complex functions. Chapter 4 develops a compositional theory of generalization and extensively probes this theory using 8 experimental paradigms. During the second half of the thesis, I investigate how function learning guides decision making in complex decision making tasks. In particular, Chapter 5 will look at how people search for rewards in various grid worlds where a spatial correlation of rewards provides a context supporting generalization and decision making. Chapter 6 gauges human behavior in contextual multi-armed bandit problems where a function maps features onto expected rewards. In both Chapter 5 and Chapter 6, I find that the vast majority of subjects are best predicted by a Gaussian Process function learning model combined with an upper confidence bound sampling strategy. Chapter 7 will formally assess the adaptiveness of human generalization in complex decision making tasks using mismatched Bayesian optimization simulations and finds that the empirically observed phenomenon of undergeneralization might rather be a feature than a bug of human behavior. Finally, I summarize the empirical and theoretical lessons learned and lay out a road-map for future research on generalization in Chapter 8

    Dual Control for Approximate Bayesian Reinforcement Learning

    No full text
    Control of non-episodic, finite-horizon dynamical systems with uncertain dynamics poses a tough and elementary case of the exploration-exploitation trade-off. Bayesian reinforcement learning, reasoning about the effect of actions and future observations, offers a principled solution, but is intractable. We review, then extend an old approximate approach from control theory---where the problem is known as dual control---in the context of modern regression methods, specifically generalized linear regression. Experiments on simulated systems show that this framework offers a useful approximation to the intractable aspects of Bayesian RL, producing structured exploration strategies that differ from standard RL approaches. We provide simple examples for the use of this framework in (approximate) Gaussian process regression and feedforward neural networks for the control of exploration.Comment: 30 pages, 7 figure
    corecore