826 research outputs found

    The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes

    Get PDF
    Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a "limited offer" game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    The dual process account of reasoning: historical roots, problems and perspectives.

    Get PDF
    Despite the great effort that has been dedicated to the attempt to redefine expected utility theory on the grounds of new assumptions, modifying or moderating some axioms, none of the alternative theories propounded so far had a statistical confirmation over the full domain of applicability. Moreover, the discrepancy between prescriptions and behaviors is not limited to expected utility theory. In two other fundamental fields, probability and logic, substantial evidence shows that human activities deviate from the prescriptions of the theoretical models. The paper suggests that the discrepancy cannot be ascribed to an imperfect axiomatic description of human choice, but to some more general features of human reasoning and assumes the “dual-process account of reasoning” as a promising explanatory key. This line of thought is based on the distinction between the process of deliberate reasoning and that of intuition; where in a first approximation, “intuition” denotes a mental activity largely automatized and inaccessible from conscious mental activity. The analysis of the interactions between these two processes provides the basis for explaining the persistence of the gap between normative and behavioral patterns. This view will be explored in the following pages: central consideration will be given to the problem of the interactions between rationality and intuition, and the correlated “modularity” of the thought.

    Guiding Robot Exploration in Reinforcement Learning via Automated Planning

    Full text link
    Reinforcement learning (RL) enables an agent to learn from trial-and-error experiences toward achieving long-term goals; automated planning aims to compute plans for accomplishing tasks using action knowledge. Despite their shared goal of completing complex tasks, the development of RL and automated planning has been largely isolated due to their different computational modalities. Focusing on improving RL agents' learning efficiency, we develop Guided Dyna-Q (GDQ) to enable RL agents to reason with action knowledge to avoid exploring less-relevant states. The action knowledge is used for generating artificial experiences from an optimistic simulation. GDQ has been evaluated in simulation and using a mobile robot conducting navigation tasks in a multi-room office environment. Compared with competitive baselines, GDQ significantly reduces the effort in exploration while improving the quality of learned policies.Comment: Accepted in International Conference of Planning and Scheduling (ICAPS-21

    Optimal Feedback Control Rules Sensitive to Controlled Endogenous Risk-Aversion

    Get PDF
    The objective of this paper is to correct and improve the results obtained by Van der Ploeg (1984a, 1984b) and utilized in the literature related to feedback stochastic optimal control sensitive to constant exogenous risk-aversion (Karp 1987; Whittle 1989, 1990; Chow 1993, amongst others). More realistic, the proposed approach deals with endoge- nous risks that are under the control of the decision-maker. It has strong implications on the policy decisions adopted by the decision-maker during the entire planning horizon.Controlled stochastic environment, rational decision-maker, adaptive control, optimal path, feedback optimal strategy, endogenous risk-aversion, dynamic active learning.Controlled stochastic environment, rational decision-maker, adaptive control, optimal path, feedback optimal strategy, endogenous risk-aversion, dynamic active learning.
    • 

    corecore