Search CORE

119,262 research outputs found

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

Author: Brochu Eric
Cora Vlad M.
de Freitas Nando
Publication venue
Publication date: 01/01/2009
Field of study

We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling

Author: Bilodeau Guillaume-Alexandre
Farooq Bilal
Wong Melvin
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

Conventional methods of estimating latent behaviour generally use attitudinal questions which are subjective and these survey questions may not always be available. We hypothesize that an alternative approach can be used for latent variable estimation through an undirected graphical models. For instance, non-parametric artificial neural networks. In this study, we explore the use of generative non-parametric modelling methods to estimate latent variables from prior choice distribution without the conventional use of measurement indicators. A restricted Boltzmann machine is used to represent latent behaviour factors by analyzing the relationship information between the observed choices and explanatory variables. The algorithm is adapted for latent behaviour analysis in discrete choice scenario and we use a graphical approach to evaluate and understand the semantic meaning from estimated parameter vector values. We illustrate our methodology on a financial instrument choice dataset and perform statistical analysis on parameter sensitivity and stability. Our findings show that through non-parametric statistical tests, we can extract useful latent information on the behaviour of latent constructs through machine learning methods and present strong and significant influence on the choice process. Furthermore, our modelling framework shows robustness in input variability through sampling and validation

arXiv.org e-Print Archive

Crossref

PolyPublie

Role of dorsomedial striatum neuronal ensembles in incubation of methamphetamine craving after voluntary abstinence

Author: Bossert Jennifer M.
CAPRIOLI DANIELE
Hope Bruce T.
Shaham Yavin
Venniro Marco
Warren Brandon L.
Zhang Michelle
Publication venue: 'Society for Neuroscience'
Publication date: 01/01/2017
Field of study

Abstract We recently developed a rat model of incubation of methamphetamine craving after choice-based voluntary abstinence. Here, we studied the role of dorsolateral striatum (DLS) and dorsomedial striatum (DMS) in this incubation. We trained rats to self-administer palatable food pellets (6 d, 6 h/d) and methamphetamine (12 d, 6 h/d). We then assessed relapse to methamphetamine seeking under extinction conditions after 1 and 21 abstinence days. Between tests, the rats underwent voluntary abstinence (using a discrete choice procedure between methamphetamine and food; 20 trials/d) for 19 d. We used in situ hybridization to measure the colabeling of the activity marker Fos with Drd1 and Drd2 in DMS and DLS after the tests. Based on the in situ hybridization colabeling results, we tested the causal role of DMS D1 and D2 family receptors, and DMS neuronal ensembles in "incubated" methamphetamine seeking, using selective dopamine receptor antagonists (SCH39166 or raclopride) and the Daun02 chemogenetic inactivation procedure, respectively. Methamphetamine seeking was higher after 21 d of voluntary abstinence than after 1 d (incubation of methamphetamine craving). The incubated response was associated with increased Fos expression in DMS but not in DLS; Fos was colabeled with both Drd1 and Drd2 DMS injections of SCH39166 or raclopride selectively decreased methamphetamine seeking after 21 abstinence days. In Fos-lacZ transgenic rats, selective inactivation of relapse test-activated Fos neurons in DMS on abstinence day 18 decreased incubated methamphetamine seeking on day 21. Results demonstrate a role of DMS dopamine D1 and D2 receptors in the incubation of methamphetamine craving after voluntary abstinence and that DMS neuronal ensembles mediate this incubation. SIGNIFICANCE STATEMENT: In human addicts, abstinence is often self-imposed and relapse can be triggered by exposure to drug-associated cues that induce drug craving. We recently developed a rat model of incubation of methamphetamine craving after choice-based voluntary abstinence. Here, we used classical pharmacology, in situ hybridization, immunohistochemistry, and the Daun02 inactivation procedure to demonstrate a critical role of dorsomedial striatum neuronal ensembles in this new form of incubation of drug craving

Crossref

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

Active Inverse Reward Design

Author: Gleave Adam
Hadfield-Menell Dylan
Mindermann Sören
Shah Rohin
Publication venue
Publication date: 06/11/2019
Field of study

Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the desired behavior, but this only guarantees good behavior in the training environment. We propose structuring this process as a series of queries asking the user to compare between different reward functions. Thus we can actively select queries for maximum informativeness about the true reward. In contrast to approaches asking the designer for optimal behavior, this allows us to gather additional information by eliciting preferences between suboptimal behaviors. After each query, we need to update the posterior over the true reward function from observing the proxy reward function chosen by the designer. The recently proposed Inverse Reward Design (IRD) enables this. Our approach substantially outperforms IRD in test environments. In particular, it can query the designer about interpretable, linear reward functions and still infer non-linear ones

arXiv.org e-Print Archive