11,632 research outputs found

    A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

    Full text link
    We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences

    Argumentation accelerated reinforcement learning

    Get PDF
    Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people’s (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations — SARSA(λ)-based AARL and MAXQ-based AARL — in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.Open Acces

    Neurobiological Foundations Of Stability And Flexibility

    Get PDF
    In order to adapt to changing and uncertain environments, humans and other organisms must balance stability and flexibility in learning and behavior. Stability is necessary to learn environmental regularities and support ongoing behavior, while flexibility is necessary when beliefs need to be revised or behavioral strategies need to be changed. Adjusting the balance between stability and flexibility must often be based on endogenously generated decisions that are informed by information from the environment but not dictated explicitly. This dissertation examines the neurobiological bases of such endogenous flexibility, focusing in particular on the role of prefrontally-mediated cognitive control processes and the neuromodulatory actions of dopaminergic and noradrenergic systems. In the first study (Chapter 2), we examined the role of frontostriatal circuits in instructed reinforcement learning. In this paradigm, inaccurate instructions are given prior to trial-and-error learning, leading to bias in learning and choice. Abandoning the instructions thus necessitates flexibility. We utilized transcranial direct current stimulation over dorsolateral prefrontal cortex to try to establish a causal role for this area in this bias. We also assayed two genes, the COMT Val158Met genetic polymorphism and the DAT1/SLC6A3 variable number tandem repeat, which affect prefrontal and striatal dopamine, respectively. The results support the role of prefrontal cortex in biasing learning, and provide further evidence that individual differences in the balance between prefrontal and striatal dopamine may be particularly important in the tradeoff between stability and flexibility. In the second study (Chapter 3), we assess the neurobiological mechanisms of stability and flexibility in the context of exploration, utilizing fMRI to examine dynamic changes in functional brain networks associated with exploratory choices. We then relate those changes to changes in norepinephrine activity, as measured indirectly via pupil diameter. We find tentative support for the hypothesis that increased norepinephrine activity around exploration facilitates the reorganization of functional brain networks, potentially providing a substrate for flexible exploratory states. Together, this work provides further support for the framework that stability and flexibility entail both costs and benefits, and that optimizing the balance between the two involves interactions of learning and cognitive control systems under the influence of catecholamines

    ASPiRe:Adaptive Skill Priors for Reinforcement Learning

    Full text link
    We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning. Unlike existing methods that learn a single skill prior from a large and diverse dataset, our framework learns a library of different distinction skill priors (i.e., behavior priors) from a collection of specialized datasets, and learns how to combine them to solve a new task. This formulation allows the algorithm to acquire a set of specialized skill priors that are more reusable for downstream tasks; however, it also brings up additional challenges of how to effectively combine these unstructured sets of skill priors to form a new prior for new tasks. Specifically, it requires the agent not only to identify which skill prior(s) to use but also how to combine them (either sequentially or concurrently) to form a new prior. To achieve this goal, ASPiRe includes Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment between different skill priors and uses them to guide policy learning for downstream tasks via weighted Kullback-Leibler divergences. Our experiments demonstrate that ASPiRe can significantly accelerate the learning of new downstream tasks in the presence of multiple priors and show improvement on competitive baselines.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022
    • …
    corecore