72,297 research outputs found

    Approximate dynamic programming for anemia management.

    Get PDF
    The focus of this dissertation work is the formulation and improvement of anemia management process involving trial-and-error. A two-stage method is adopted toward this objective. Given a medical treatment process, a discrete Markov representation is first derived as a formal translation of the treatment process to a control problem under uncertainty. A simulative numerical solution of the control problem is then obtained on-the-fly in the form of a control law maximizing the long-term benefit at each decision stage. Approximate dynamic programming methods are employed in the proposed solution. The motivation underlying this choice is that, in reality, some patient characteristics, which are critical for the sake of treatment, cannot be determined through diagnosis and remain unknown until early stages of treatment, when the patient demonstrates them upon actions by the decision maker. A review of these simulative control tools, which are studied extensively in reinforcement learning theory, is presented. Two approximate dynamic programming tools, namely SARSA and Q -learning, are introduced. Their performance in discovering the optimal individualized drug dosing policy is illustrated on hypothetical patients made up as fuzzy models for simulations. As an addition to these generic reinforcement learning methods, a state abstraction scheme for the considered application domain is also proposed. The control methods of this study, capturing the essentials of a drug delivery problem, constitutes a novel computational framework for model-free medical treatment. Experimental evaluation of the dosing strategies produced by the proposed methods against the standard policy, which is being followed actually by human experts in Kidney Diseases Program, University of Louisville, shows the advantages for use of reinforcement learning in the drug dosing problem in particular and in medical decision making in general

    Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

    Full text link
    This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure

    Learning Representations in Model-Free Hierarchical Reinforcement Learning

    Full text link
    Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences (trajectories) of the agent. When combined with an intrinsic motivation learning mechanism, this method learns both subgoals and skills, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the first screen of the ATARI 2600 Montezuma's Revenge game
    • …
    corecore