119,262 research outputs found
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
We present a tutorial on Bayesian optimization, a method of finding the
maximum of expensive cost functions. Bayesian optimization employs the Bayesian
technique of setting a prior over the objective function and combining it with
evidence to get a posterior function. This permits a utility-based selection of
the next observation to make on the objective function, which must take into
account both exploration (sampling from areas of high uncertainty) and
exploitation (sampling areas likely to offer improvement over the current best
observation). We also present two detailed extensions of Bayesian optimization,
with experiments---active user modelling with preferences, and hierarchical
reinforcement learning---and a discussion of the pros and cons of Bayesian
optimization based on our experiences
Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling
Conventional methods of estimating latent behaviour generally use attitudinal
questions which are subjective and these survey questions may not always be
available. We hypothesize that an alternative approach can be used for latent
variable estimation through an undirected graphical models. For instance,
non-parametric artificial neural networks. In this study, we explore the use of
generative non-parametric modelling methods to estimate latent variables from
prior choice distribution without the conventional use of measurement
indicators. A restricted Boltzmann machine is used to represent latent
behaviour factors by analyzing the relationship information between the
observed choices and explanatory variables. The algorithm is adapted for latent
behaviour analysis in discrete choice scenario and we use a graphical approach
to evaluate and understand the semantic meaning from estimated parameter vector
values. We illustrate our methodology on a financial instrument choice dataset
and perform statistical analysis on parameter sensitivity and stability. Our
findings show that through non-parametric statistical tests, we can extract
useful latent information on the behaviour of latent constructs through machine
learning methods and present strong and significant influence on the choice
process. Furthermore, our modelling framework shows robustness in input
variability through sampling and validation
Role of dorsomedial striatum neuronal ensembles in incubation of methamphetamine craving after voluntary abstinence
Abstract
We recently developed a rat model of incubation of methamphetamine craving after choice-based voluntary abstinence. Here, we studied the role of dorsolateral striatum (DLS) and dorsomedial striatum (DMS) in this incubation. We trained rats to self-administer palatable food pellets (6 d, 6 h/d) and methamphetamine (12 d, 6 h/d). We then assessed relapse to methamphetamine seeking under extinction conditions after 1 and 21 abstinence days. Between tests, the rats underwent voluntary abstinence (using a discrete choice procedure between methamphetamine and food; 20 trials/d) for 19 d. We used in situ hybridization to measure the colabeling of the activity marker Fos with Drd1 and Drd2 in DMS and DLS after the tests. Based on the in situ hybridization colabeling results, we tested the causal role of DMS D1 and D2 family receptors, and DMS neuronal ensembles in "incubated" methamphetamine seeking, using selective dopamine receptor antagonists (SCH39166 or raclopride) and the Daun02 chemogenetic inactivation procedure, respectively. Methamphetamine seeking was higher after 21 d of voluntary abstinence than after 1 d (incubation of methamphetamine craving). The incubated response was associated with increased Fos expression in DMS but not in DLS; Fos was colabeled with both Drd1 and Drd2 DMS injections of SCH39166 or raclopride selectively decreased methamphetamine seeking after 21 abstinence days. In Fos-lacZ transgenic rats, selective inactivation of relapse test-activated Fos neurons in DMS on abstinence day 18 decreased incubated methamphetamine seeking on day 21. Results demonstrate a role of DMS dopamine D1 and D2 receptors in the incubation of methamphetamine craving after voluntary abstinence and that DMS neuronal ensembles mediate this incubation.
SIGNIFICANCE STATEMENT:
In human addicts, abstinence is often self-imposed and relapse can be triggered by exposure to drug-associated cues that induce drug craving. We recently developed a rat model of incubation of methamphetamine craving after choice-based voluntary abstinence. Here, we used classical pharmacology, in situ hybridization, immunohistochemistry, and the Daun02 inactivation procedure to demonstrate a critical role of dorsomedial striatum neuronal ensembles in this new form of incubation of drug craving
Active Inverse Reward Design
Designers of AI agents often iterate on the reward function in a
trial-and-error process until they get the desired behavior, but this only
guarantees good behavior in the training environment. We propose structuring
this process as a series of queries asking the user to compare between
different reward functions. Thus we can actively select queries for maximum
informativeness about the true reward. In contrast to approaches asking the
designer for optimal behavior, this allows us to gather additional information
by eliciting preferences between suboptimal behaviors. After each query, we
need to update the posterior over the true reward function from observing the
proxy reward function chosen by the designer. The recently proposed Inverse
Reward Design (IRD) enables this. Our approach substantially outperforms IRD in
test environments. In particular, it can query the designer about
interpretable, linear reward functions and still infer non-linear ones
- …