826 research outputs found
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes
Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a "limited offer" game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
The dual process account of reasoning: historical roots, problems and perspectives.
Despite the great effort that has been dedicated to the attempt to redefine expected utility theory on the grounds of new assumptions, modifying or moderating some axioms, none of the alternative theories propounded so far had a statistical confirmation over the full domain of applicability. Moreover, the discrepancy between prescriptions and behaviors is not limited to expected utility theory. In two other fundamental fields, probability and logic, substantial evidence shows that human activities deviate from the prescriptions of the theoretical models. The paper suggests that the discrepancy cannot be ascribed to an imperfect axiomatic description of human choice, but to some more general features of human reasoning and assumes the âdual-process account of reasoningâ as a promising explanatory key. This line of thought is based on the distinction between the process of deliberate reasoning and that of intuition; where in a first approximation, âintuitionâ denotes a mental activity largely automatized and inaccessible from conscious mental activity. The analysis of the interactions between these two processes provides the basis for explaining the persistence of the gap between normative and behavioral patterns. This view will be explored in the following pages: central consideration will be given to the problem of the interactions between rationality and intuition, and the correlated âmodularityâ of the thought.
Guiding Robot Exploration in Reinforcement Learning via Automated Planning
Reinforcement learning (RL) enables an agent to learn from trial-and-error
experiences toward achieving long-term goals; automated planning aims to
compute plans for accomplishing tasks using action knowledge. Despite their
shared goal of completing complex tasks, the development of RL and automated
planning has been largely isolated due to their different computational
modalities. Focusing on improving RL agents' learning efficiency, we develop
Guided Dyna-Q (GDQ) to enable RL agents to reason with action knowledge to
avoid exploring less-relevant states. The action knowledge is used for
generating artificial experiences from an optimistic simulation. GDQ has been
evaluated in simulation and using a mobile robot conducting navigation tasks in
a multi-room office environment. Compared with competitive baselines, GDQ
significantly reduces the effort in exploration while improving the quality of
learned policies.Comment: Accepted in International Conference of Planning and Scheduling
(ICAPS-21
Optimal Feedback Control Rules Sensitive to Controlled Endogenous Risk-Aversion
The objective of this paper is to correct and improve the results obtained by Van der Ploeg (1984a, 1984b) and utilized in the literature related to feedback stochastic optimal control sensitive to constant exogenous risk-aversion (Karp 1987; Whittle 1989, 1990; Chow 1993, amongst others). More realistic, the proposed approach deals with endoge- nous risks that are under the control of the decision-maker. It has strong implications on the policy decisions adopted by the decision-maker during the entire planning horizon.Controlled stochastic environment, rational decision-maker, adaptive control, optimal path, feedback optimal strategy, endogenous risk-aversion, dynamic active learning.Controlled stochastic environment, rational decision-maker, adaptive control, optimal path, feedback optimal strategy, endogenous risk-aversion, dynamic active learning.
- âŠ