Search CORE

524 research outputs found

Reinforcement learning in continuous state and action spaces

Author: Hasselt H. P. (Hado) van
Publication venue: Springer Berlin Heidelberg
Publication date: 01/04/2012
Field of study

Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

CWI's Institutional Repository

Double Q-learning

Author: Hasselt H. P. (Hado) van
Publication venue: The MIT Press
Publication date: 01/12/2010
Field of study

In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values, which result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way to approximate the maximum expected value for any set of random variables. The obtained double estimator method is shown to sometimes underestimate rather than overestimate the maximum expected value. We apply the double estimator to Q-learning to construct Double Q-learning, a new off-policy reinforcement learning algorithm. We show the new algorithm converges to the optimal policy and that it performs well in some settings in which Q-learning performs poorly due to its overestimation

CWI's Institutional Repository

Exploration via Epistemic Value Estimation

Author: Schmitt S
Shawe-Taylor J
van Hasselt H
Publication venue: The Association for the Advancement of Artificial Intelligence (AAAI)
Publication date: 01/01/2023
Field of study

How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions - for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks

UCL Discovery

Factorized Q-Learning for Large-Scale Multi-Agent Systems

Author: Claus Caroline
Foerster Jakob N.
HolmesParker Chris
Jelle
Lample Guillaume
Littman Michael L.
Lowe Ryan
Tesauro Gerald
van Hasselt Hado
van Hasselt Hado
Wang Ziyu
Watkins Christopher J. C. H.
Yang Yaodong
Zheng Lianmin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/10/2019
Field of study

Deep Q-learning has achieved significant success in single-agent decision making tasks. However, it is challenging to extend Q-learning to large-scale multi-agent scenarios, due to the explosion of action space resulting from the complex dynamics between the environment and the agents. In this paper, we propose to make the computation of multi-agent Q-learning tractable by treating the Q-function (w.r.t. state and joint-action) as a high-order high-dimensional tensor and then approximate it with factorized pairwise interactions. Furthermore, we utilize a composite deep neural network architecture for computing the factorized Q-function, share the model parameters among all the agents within the same group, and estimate the agents' optimal joint actions through a coordinate descent type algorithm. All these simplifications greatly reduce the model complexity and accelerate the learning process. Extensive experiments on two different multi-agent problems demonstrate the performance gain of our proposed approach in comparison with strong baselines, particularly when there are a large number of agents.Comment: 7 pages, 5 figures, DAI 201

arXiv.org e-Print Archive

Crossref

Early maternal deprivation affects dentate gyrus structure and emotional learning in adult female rats

Author: Audureau N.
Joëls M.
Krugers H.
Lucassen P.J.
Manders E.M.M.
Oomen C.A.
Soeters H.
van Hasselt F.N.
Vermunt L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications

Early maternal deprivation affects dentate gyrus structure and emotional learning in adult female rats

Author: Audureau N.
Joëls M.
Krugers H.
Lucassen P.J.
Manders E.M.M.
Oomen C.A.
Soeters H.
van Hasselt F.N.
Vermunt L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications

Continuous-action reinforcement learning for memory allocation in virtualized servers

Author: H Van Hasselt
LA Garrido
M Armbrust
Q Zhang
RS Sutton
V Mnih
W Zhang
X Bu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In a virtualized computing server (node) with multiple Virtual Machines (VMs), it is necessary to dynamically allocate memory among the VMs. In many cases, this is done only considering the memory demand of each VM without having a node-wide view. There are many solutions for the dynamic memory allocation problem, some of which use machine learning in some form. This paper introduces CAVMem (Continuous-Action Algorithm for Virtualized Memory Management), a proof-of-concept mechanism for a decentralized dynamic memory allocation solution in virtualized nodes that applies a continuous-action reinforcement learning (RL) algorithm called Deep Deterministic Policy Gradient (DDPG). CAVMem with DDPG is compared with other RL algorithms such as Q-Learning (QL) and Deep Q-Learning (DQL) in an environment that models a virtualized node. In order to obtain linear scaling and be able to dynamically add and remove VMs, CAVMem has one agent per VM connected via a lightweight coordination mechanism. The agents learn how much memory to bid for or return, in a given state, so that each VM obtains a fair level of performance subject to the available memory resources. Our results show that CAVMem with DDPG performs better than QL and a static allocation case, but it is competitive with DQL. However, CAVMem incurs significant less training overheads than DQL, making the continuous-action approach a more cost-effective solution.This research is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA) and the European Union’s 7th Framework Programme under grant agreement number 610456 (Euroserver). It also received funding from the Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contract 2014-SGR-1272), and the Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Unraveling the Effects of Acute Inflammation on Pharmacokinetics: A Model-Based Analysis Focusing on Renal Glomerular Filtration Rate and Cytochrome P450 3A4-Mediated Metabolism

Author: Aulin Linda B. S.
Krekels Elke H. J.
Liu Feiyan
Manson Martijn L.
van Hasselt J. G. Coen
Publication venue
Publication date: 01/01/2023
Field of study

Background and Objectives Acute inflammation caused by infections or sepsis can impact pharmacokinetics. We used a model-based analysis to evaluate the effect of acute inflammation as represented by interleukin-6 (IL-6) levels on drug clearance, focusing on renal glomerular filtration rate (GFR) and cytochrome P450 3A4 (CYP3A4)-mediated metabolism. Methods A physiologically based model incorporating renal and hepatic drug clearance was implemented. Functions correlating IL-6 levels with GFR and in vitro CYP3A4 activity were derived and incorporated into the modeling framework. We then simulated treatment scenarios for hypothetical drugs by varying the IL-6 levels, the contribution of renal and hepatic drug clearance, and protein binding. The relative change in observed area under the concentration-time curve (AUC) was computed for these scenarios. Results Inflammation showed opposite effects on drug exposure for drugs eliminated via the liver and kidney, with the effect of inflammation being inversely proportional to the extraction ratio (ER). For renally cleared drugs, the relative decrease in AUC was close to 30% during severe inflammation. For CYP3A4 substrates, the relative increase in AUC could exceed 50% for low-ER drugs. Finally, the impact of inflammation-induced changes in drug clearance is smaller for drugs with a larger unbound fraction. Conclusion This analysis demonstrates differences in the impact of inflammation on drug clearance for different drug types. The effects of inflammation status on pharmacokinetics may explain the inter-individual variability in pharmacokinetics in critically ill patients. The proposed model-based analysis may be used to further evaluate the effect of inflammation, i.e., by incorporating the effect of inflammation on other drug-metabolizing enzymes or physiological processes

Institutional Repository of the Freie Universität Berlin

Exercise Stress Testing in Children with Metabolic or Neuromuscular Disorders

Author: Ernsting Cornelia G.
Groen Wim G.
Helders Paul J.
Hulzebos Erik H.
Prinsen Berthil H.
Takken Tim
van Hasselt Peter M.
Visser Gepke
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2010
Field of study

The role of exercise as a diagnostic or therapeutic tool in patients with a metabolic disease (MD) or neuromuscular disorder (NMD) is relatively underresearched. In this paper we describe the metabolic profiles during exercise in 13 children (9 boys, 4 girls, age 5–15 yrs) with a diagnosed MD or NMD. Graded cardiopulmonary exercise tests and/or a 90-min prolonged submaximal exercise test were performed. During exercise, respiratory gas-exchange and heart rate were monitored; blood and urine samples were collected for biochemical analysis at set time points. Several characteristics in our patient group were observed, which reflected the differences in pathophysiology of the various disorders. Metabolic profiles during exercises CPET and PXT seem helpful in the evaluation of patients with a MD or NMD

Crossref

Directory of Open Access Journals

PubMed Central

In-Brace versus Out-of-Brace Protocol for Radiographic Follow-Up of Patients with Idiopathic Scoliosis:A Retrospective Study

Author: Faber Christopher
Jutte Paulus C.
Kempen Diederik H. R.
Peeters Charles M. M.
van Hasselt Arthur J.
Wapstra Frits-Hein
Publication venue: 'MDPI AG'
Publication date: 01/03/2022
Field of study

The purpose of this retrospective study was to compare two standardized protocols for radiological follow-up (in-brace versus out-of-brace radiographs) to study the rate of curve progression over time in surgically treated idiopathic scoliosis (IS) patients after failed brace treatment. In-brace radiographs have the advantage that proper fit of the brace and in-brace correction can be evaluated. However, detection of progression might theoretically be more difficult. Fifty-one IS patients that underwent surgical treatment after failed brace treatment were included. For 25 patients, follow-up radiographs were taken in-brace. For the other 26 patients, brace treatment was temporarily stopped before out-of-brace follow-up radiographs were taken. Both groups showed significant curve progression compared to baseline after a mean follow-up period of 3.4 years. The protocol with in-brace radiographs was noninferior regarding curve progression rate over time. The estimated monthly Cobb angle progression based on the mixed-effect model was 0.5 degrees in both groups. No interaction effect was found for time, and patients' baseline Cobb angle (p = 0.98), and for time and patients' initial in-brace correction (p = 0.32). The results of this study indicate that with both in-brace and out-of-brace protocols for radiographic follow-up, a similar rate of curve progression can be expected over time in IS patients with failed brace treatment

Multidisciplinary Digital Publishing Institute

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

Dissertations of the University of Groningen