121,728 research outputs found

    Comparing policy gradient and value function based reinforcement learning methods in simulated electrical power trade

    Get PDF
    In electrical power engineering, reinforcement learning algorithms can be used to model the strategies of electricity market participants. However, traditional value function based reinforcement learning algorithms suffer from convergence issues when used with value function approximators. Function approximation is required in this domain to capture the characteristics of the complex and continuous multivariate problem space. The contribution of this paper is the comparison of policy gradient reinforcement learning methods, using artificial neural networks for policy function approximation, with traditional value function based methods in simulations of electricity trade. The methods are compared using an AC optimal power flow based power exchange auction market model and a reference electric power system model

    Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

    Get PDF
    This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks

    Solving planning problems with deep reinforcement learning and tree search

    Get PDF
    Deep reinforcement learning methods are capable of learning complex heuristics starting with no prior knowledge, but struggle in environments where the learning signal is sparse. In contrast, planning methods can discover the optimal path to a goal in the absence of external rewards, but often require a hand-crafted heuristic function to be effective. In this thesis, we describe a model-based reinforcement learning method that bridges the middle ground between these two approaches. When evaluated on the complex domain of Sokoban, the model-based method was found to be more performant, stable and sample-efficient than a model-free baseline

    Plan-Guided Reinforcement Learning for Whole-Body Manipulation

    Full text link
    Synthesizing complex whole-body manipulation behaviors has fundamental challenges due to the rapidly growing combinatorics inherent to contact interaction planning. While model-based methods have shown promising results in solving long-horizon manipulation tasks, they often work under strict assumptions, such as known model parameters, oracular observation of the environment state, and simplified dynamics, resulting in plans that cannot easily transfer to hardware. Learning-based approaches, such as imitation learning (IL) and reinforcement learning (RL), have been shown to be robust when operating over in-distribution states; however, they need heavy human supervision. Specifically, model-free RL requires a tedious reward-shaping process. IL methods, on the other hand, rely on human demonstrations that involve advanced teleoperation methods. In this work, we propose a plan-guided reinforcement learning (PGRL) framework to combine the advantages of model-based planning and reinforcement learning. Our method requires minimal human supervision because it relies on plans generated by model-based planners to guide the exploration in RL. In exchange, RL derives a more robust policy thanks to domain randomization. We test this approach on a whole-body manipulation task on Punyo, an upper-body humanoid robot with compliant, air-filled arm coverings, to pivot and lift a large box. Our preliminary results indicate that the proposed methodology is promising to address challenges that remain difficult for either model- or learning-based strategies alone.Comment: 4 pages, 4 figure

    Design and evaluation of a hybrid multi-task learning model for optimizing deep reinforcement learning agents

    Get PDF
    Driven by recent technological advancements within the artificial intelligence domain, deep learning has emerged as a promising representation learning technique. This in turn has given rise to the evolution of deep reinforcement learning that combines deep learning with reinforcement learning methods. Subsequently, performance optimization achieved by reinforcement learning intelligent agents designed with model-free based approaches were predominantly limited to systems with reinforcement learning algorithms learning single task. Such a model was found to be quite data inefficient, whenever agents needed to interact with more complex, rich data environments. This thesis introduces a hybrid multi-task learning-oriented approach for optimization of deep reinforcement learning agents operating within different but semantically similar environments with related tasks. Empirical results obtained with OpenAI Gym library-based Atari 2600 video gaming environment demonstrate that the proposed hybrid multi-task learning model is successful in addressing key challenges associated with the performance optimization of deep reinforcement learning agents

    Scalable and data efficient deep reinforcement learning methods for healthcare applications

    Get PDF
    2019 Fall.Includes bibliographical references.Artificial intelligence driven medical devices have created the potential for significant breakthroughs in healthcare technology. Healthcare applications using reinforcement learning are still very sparse as the medical domain is very complex and decision making requires domain expertise. High volumes of data generated from medical devices – a key input for delivering on the promise of AI, suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated. Unlike other data sets, medical data annotation, which is critical for accurate ground truth, requires medical domain expertise for a high-quality patient outcome. While accurate recommendation of decisions is vital in this context, making them in near real-time on devices with computational resource constraint requires that we build efficient, compact representations of models such as deep neural networks. While deeper and wider neural networks are designed for complex healthcare applications, model compression can be an effective way to deploy networks on medical devices that often have hardware and speed constraints. Most state-of-the-art model compression techniques require a resource centric manual process that explores a large model architecture space to find a trade-off solution between model size and accuracy. Recently, reinforcement learning (RL) approaches are proposed to automate such a hand-crafted process. However, most RL model compression algorithms are model-free which require longer time with no assumptions of the model. On the contrary, model-based (MB) approaches are data driven; have faster convergence but are sensitive to the bias in the model. In this work, we report on the use of reinforcement learning to mimic the decision-making process of annotators for medical events, to automate annotation and labelling. The reinforcement agent learns to annotate alarm data based on annotations done by an expert. Our method shows promising results on medical alarm data sets. We trained deep Q-network and advantage actor-critic agents using the data from monitoring devices that are annotated by an expert. Initial results from these RL agents learning the expert-annotated behavior are encouraging and promising. The advantage actor-critic agent performs better in terms of learning the sparse events in a given state, thereby choosing more right actions compared to deep Q-network agent. To the best of our knowledge, this is the first reinforcement learning application for the automation of medical events annotation, which has far-reaching practical use. In addition, a data-driven model-based algorithm is developed, which integrates seamlessly with model-free RL approaches for automation of deep neural network model compression. We evaluate our algorithm on a variety of imaging data from dermoscopy to X-ray on different popular and public model architectures. Compared to model-free RL approaches, our approach achieves faster convergence; exhibits better generalization across different data sets; and preserves comparable model performance. The new RL methods' application to healthcare domain from this work for both false alarm detection and model compression is generic and can be applied to any domain where sequential decision making is partially random and practically controlled by the decision maker
    corecore