524 research outputs found

    Reinforcement learning in continuous state and action spaces

    Get PDF
    Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

    Double Q-learning

    Get PDF
    In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values, which result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way to approximate the maximum expected value for any set of random variables. The obtained double estimator method is shown to sometimes underestimate rather than overestimate the maximum expected value. We apply the double estimator to Q-learning to construct Double Q-learning, a new off-policy reinforcement learning algorithm. We show the new algorithm converges to the optimal policy and that it performs well in some settings in which Q-learning performs poorly due to its overestimation

    Exploration via Epistemic Value Estimation

    Get PDF
    How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions - for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks

    Factorized Q-Learning for Large-Scale Multi-Agent Systems

    Full text link
    Deep Q-learning has achieved significant success in single-agent decision making tasks. However, it is challenging to extend Q-learning to large-scale multi-agent scenarios, due to the explosion of action space resulting from the complex dynamics between the environment and the agents. In this paper, we propose to make the computation of multi-agent Q-learning tractable by treating the Q-function (w.r.t. state and joint-action) as a high-order high-dimensional tensor and then approximate it with factorized pairwise interactions. Furthermore, we utilize a composite deep neural network architecture for computing the factorized Q-function, share the model parameters among all the agents within the same group, and estimate the agents' optimal joint actions through a coordinate descent type algorithm. All these simplifications greatly reduce the model complexity and accelerate the learning process. Extensive experiments on two different multi-agent problems demonstrate the performance gain of our proposed approach in comparison with strong baselines, particularly when there are a large number of agents.Comment: 7 pages, 5 figures, DAI 201

    Continuous-action reinforcement learning for memory allocation in virtualized servers

    Get PDF
    In a virtualized computing server (node) with multiple Virtual Machines (VMs), it is necessary to dynamically allocate memory among the VMs. In many cases, this is done only considering the memory demand of each VM without having a node-wide view. There are many solutions for the dynamic memory allocation problem, some of which use machine learning in some form. This paper introduces CAVMem (Continuous-Action Algorithm for Virtualized Memory Management), a proof-of-concept mechanism for a decentralized dynamic memory allocation solution in virtualized nodes that applies a continuous-action reinforcement learning (RL) algorithm called Deep Deterministic Policy Gradient (DDPG). CAVMem with DDPG is compared with other RL algorithms such as Q-Learning (QL) and Deep Q-Learning (DQL) in an environment that models a virtualized node. In order to obtain linear scaling and be able to dynamically add and remove VMs, CAVMem has one agent per VM connected via a lightweight coordination mechanism. The agents learn how much memory to bid for or return, in a given state, so that each VM obtains a fair level of performance subject to the available memory resources. Our results show that CAVMem with DDPG performs better than QL and a static allocation case, but it is competitive with DQL. However, CAVMem incurs significant less training overheads than DQL, making the continuous-action approach a more cost-effective solution.This research is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA) and the European Union’s 7th Framework Programme under grant agreement number 610456 (Euroserver). It also received funding from the Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contract 2014-SGR-1272), and the Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government.Peer ReviewedPostprint (author's final draft

    Unraveling the Effects of Acute Inflammation on Pharmacokinetics: A Model-Based Analysis Focusing on Renal Glomerular Filtration Rate and Cytochrome P450 3A4-Mediated Metabolism

    Get PDF
    Background and Objectives Acute inflammation caused by infections or sepsis can impact pharmacokinetics. We used a model-based analysis to evaluate the effect of acute inflammation as represented by interleukin-6 (IL-6) levels on drug clearance, focusing on renal glomerular filtration rate (GFR) and cytochrome P450 3A4 (CYP3A4)-mediated metabolism. Methods A physiologically based model incorporating renal and hepatic drug clearance was implemented. Functions correlating IL-6 levels with GFR and in vitro CYP3A4 activity were derived and incorporated into the modeling framework. We then simulated treatment scenarios for hypothetical drugs by varying the IL-6 levels, the contribution of renal and hepatic drug clearance, and protein binding. The relative change in observed area under the concentration-time curve (AUC) was computed for these scenarios. Results Inflammation showed opposite effects on drug exposure for drugs eliminated via the liver and kidney, with the effect of inflammation being inversely proportional to the extraction ratio (ER). For renally cleared drugs, the relative decrease in AUC was close to 30% during severe inflammation. For CYP3A4 substrates, the relative increase in AUC could exceed 50% for low-ER drugs. Finally, the impact of inflammation-induced changes in drug clearance is smaller for drugs with a larger unbound fraction. Conclusion This analysis demonstrates differences in the impact of inflammation on drug clearance for different drug types. The effects of inflammation status on pharmacokinetics may explain the inter-individual variability in pharmacokinetics in critically ill patients. The proposed model-based analysis may be used to further evaluate the effect of inflammation, i.e., by incorporating the effect of inflammation on other drug-metabolizing enzymes or physiological processes

    Exercise Stress Testing in Children with Metabolic or Neuromuscular Disorders

    Get PDF
    The role of exercise as a diagnostic or therapeutic tool in patients with a metabolic disease (MD) or neuromuscular disorder (NMD) is relatively underresearched. In this paper we describe the metabolic profiles during exercise in 13 children (9 boys, 4 girls, age 5–15 yrs) with a diagnosed MD or NMD. Graded cardiopulmonary exercise tests and/or a 90-min prolonged submaximal exercise test were performed. During exercise, respiratory gas-exchange and heart rate were monitored; blood and urine samples were collected for biochemical analysis at set time points. Several characteristics in our patient group were observed, which reflected the differences in pathophysiology of the various disorders. Metabolic profiles during exercises CPET and PXT seem helpful in the evaluation of patients with a MD or NMD

    In-Brace versus Out-of-Brace Protocol for Radiographic Follow-Up of Patients with Idiopathic Scoliosis:A Retrospective Study

    Get PDF
    The purpose of this retrospective study was to compare two standardized protocols for radiological follow-up (in-brace versus out-of-brace radiographs) to study the rate of curve progression over time in surgically treated idiopathic scoliosis (IS) patients after failed brace treatment. In-brace radiographs have the advantage that proper fit of the brace and in-brace correction can be evaluated. However, detection of progression might theoretically be more difficult. Fifty-one IS patients that underwent surgical treatment after failed brace treatment were included. For 25 patients, follow-up radiographs were taken in-brace. For the other 26 patients, brace treatment was temporarily stopped before out-of-brace follow-up radiographs were taken. Both groups showed significant curve progression compared to baseline after a mean follow-up period of 3.4 years. The protocol with in-brace radiographs was noninferior regarding curve progression rate over time. The estimated monthly Cobb angle progression based on the mixed-effect model was 0.5 degrees in both groups. No interaction effect was found for time, and patients' baseline Cobb angle (p = 0.98), and for time and patients' initial in-brace correction (p = 0.32). The results of this study indicate that with both in-brace and out-of-brace protocols for radiographic follow-up, a similar rate of curve progression can be expected over time in IS patients with failed brace treatment
    • 

    corecore