Search CORE

68,549 research outputs found

Accelerating Online Reinforcement Learning with Offline Datasets

Author: Dalal Murtaza
Gupta Abhishek
Levine Sergey
Nair Ashvin
Publication venue
Publication date: 12/03/2021
Field of study

Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in real-world settings such as robotic control. If we can instead allow RL algorithms to effectively use previously collected data to aid the online learning process, such applications could be made substantially more practical: the prior data would provide a starting point that mitigates challenges due to exploration and sample complexity, while the online training enables the agent to perfect the desired skill. Such prior data could either constitute expert demonstrations or, more generally, sub-optimal prior data that illustrates potentially useful transitions. But it remains difficult to train a policy with potentially sub-optimal offline data and improve it further with online RL. In this paper we systematically analyze why this problem is so challenging, and propose an algorithm that combines sample-efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of RL policies. We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.Comment: 17 pages. Website: https://awacrl.github.io

arXiv.org e-Print Archive

Variational Adaptive-Newton Method for Explorative Learning

Author: Khan Mohammad Emtiyaz
Lin Wu
Liu Zuozhu
Nielsen Didrik
Tangkaratt Voot
Publication venue
Publication date: 15/11/2017
Field of study

We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution reveals that VAN is a second-order method that unifies existing methods in distinct fields of continuous optimization, variational inference, and evolution strategies. Our experimental results show that VAN performs well on a wide-variety of learning tasks. This work presents a general-purpose explorative-learning method that has the potential to improve learning in areas such as active learning and reinforcement learning

arXiv.org e-Print Archive

Planning to Explore via Self-Supervised World Models

Author: Abbeel Pieter
Daniilidis Kostas
Hafner Danijar
Pathak Deepak
Rybkin Oleh
Sekar Ramanan
Publication venue
Publication date: 30/06/2020
Field of study

Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During exploration, unlike prior methods which retrospectively compute the novelty of observations after the agent has already reached them, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Videos and code at https://ramanans1.github.io/plan2explore/Comment: Accepted at ICML 2020. Videos and code at https://ramanans1.github.io/plan2explore

arXiv.org e-Print Archive

Active Deep Q-learning with Demonstration

Author: Chen Si-An
Lin Hsuan-Tien
Sugiyama Masashi
Tangkaratt Voot
Publication venue
Publication date: 06/12/2018
Field of study

Recent research has shown that although Reinforcement Learning (RL) can benefit from expert demonstration, it usually takes considerable efforts to obtain enough demonstration. The efforts prevent training decent RL agents with expert demonstration in practice. In this work, we propose Active Reinforcement Learning with Demonstration (ARLD), a new framework to streamline RL in terms of demonstration efforts by allowing the RL agent to query for demonstration actively during training. Under the framework, we propose Active Deep Q-Network, a novel query strategy which adapts to the dynamically-changing distributions during the RL training process by estimating the uncertainty of recent states. The expert demonstration data within Active DQN are then utilized by optimizing supervised max-margin loss in addition to temporal difference loss within usual DQN training. We propose two methods of estimating the uncertainty based on two state-of-the-art DQN models, namely the divergence of bootstrapped DQN and the variance of noisy DQN. The empirical results validate that both methods not only learn faster than other passive expert demonstration methods with the same amount of demonstration and but also reach super-expert level of performance across four different tasks

arXiv.org e-Print Archive

A Short Survey on Probabilistic Reinforcement Learning

Author: Russel Reazul Hasan
Publication venue
Publication date: 21/01/2019
Field of study

A reinforcement learning agent tries to maximize its cumulative payoff by interacting in an unknown environment. It is important for the agent to explore suboptimal actions as well as to pick actions with highest known rewards. Yet, in sensitive domains, collecting more data with exploration is not always possible, but it is important to find a policy with a certain performance guaranty. In this paper, we present a brief survey of methods available in the literature for balancing exploration-exploitation trade off and computing robust solutions from fixed samples in reinforcement learning.Comment: 7 pages, originally written as a literature survey for PhD candidacy exa

arXiv.org e-Print Archive

Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction

Author: Patwardhan Narendra
Wang Zequn
Publication venue
Publication date: 15/05/2019
Field of study

Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning typically require thousands of interactions with the environment to approximate the optimum controller which may not always be feasible in robotics due to safety and time consumption. Model-based methods such as PILCO or BlackDrops, while data-efficient, provide solutions with limited robustness and complexity. To address this tradeoff, we introduce active uncertainty reduction-based virtual environments, which are formed through limited trials conducted in the original environment. We provide an efficient method for uncertainty management, which is used as a metric for self-improvement by identification of the points with maximum expected improvement through adaptive sampling. Capturing the uncertainty also allows for better mimicking of the reward responses of the original system. Our approach enables the use of complex policy structures and reward functions through a unique combination of model-based and model-free methods, while still retaining the data efficiency. We demonstrate the validity of our method on several classic reinforcement learning problems in OpenAI gym. We prove that our approach offers a better modeling capacity for complex system dynamics as compared to established methods

arXiv.org e-Print Archive

Mobile Edge Computation Offloading Using Game Theory and Reinforcement Learning

Author: Hossain Ekram
Maghsudi Setareh
Ranadheera Shermila
Publication venue
Publication date: 19/11/2017
Field of study

Due to the ever-increasing popularity of resource-hungry and delay-constrained mobile applications, the computation and storage capabilities of remote cloud has partially migrated towards the mobile edge, giving rise to the concept known as Mobile Edge Computing (MEC). While MEC servers enjoy the close proximity to the end-users to provide services at reduced latency and lower energy costs, they suffer from limitations in computational and radio resources, which calls for fair efficient resource management in the MEC servers. The problem is however challenging due to the ultra-high density, distributed nature, and intrinsic randomness of next generation wireless networks. In this article, we focus on the application of game theory and reinforcement learning for efficient distributed resource management in MEC, in particular, for computation offloading. We briefly review the cutting-edge research and discuss future challenges. Furthermore, we develop a game-theoretical model for energy-efficient distributed edge server activation and study several learning techniques. Numerical results are provided to illustrate the performance of these distributed learning techniques. Also, open research issues in the context of resource management in MEC servers are discussed

arXiv.org e-Print Archive

Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach

Author: Hsu David
Wang Xinxi
Wang Ye
Wang Yi
Publication venue
Publication date: 06/11/2013
Field of study

Current music recommender systems typically act in a greedy fashion by recommending songs with the highest user ratings. Greedy recommendation, however, is suboptimal over the long term: it does not actively gather information on user preferences and fails to recommend novel songs that are potentially interesting. A successful recommender system must balance the needs to explore user preferences and to exploit this information for recommendation. This paper presents a new approach to music recommendation by formulating this exploration-exploitation trade-off as a reinforcement learning task called the multi-armed bandit. To learn user preferences, it uses a Bayesian model, which accounts for both audio content and the novelty of recommendations. A piecewise-linear approximation to the model and a variational inference algorithm are employed to speed up Bayesian inference. One additional benefit of our approach is a single unified model for both music recommendation and playlist generation. Both simulation results and a user study indicate strong potential for the new approach

arXiv.org e-Print Archive

Deep Reinforcement Learning based Optimal Control of Hot Water Systems

Author: Driesen Johan
Kazmi Hussain
Lodeweyckx Stefan
Mehmood Fahad
Publication venue: 'Elsevier BV'
Publication date: 04/01/2018
Field of study

Energy consumption for hot water production is a major draw in high efficiency buildings. Optimizing this has typically been approached from a thermodynamics perspective, decoupled from occupant influence. Furthermore, optimization usually presupposes existence of a detailed dynamics model for the hot water system. These assumptions lead to suboptimal energy efficiency in the real world. In this paper, we present a novel reinforcement learning based methodology which optimizes hot water production. The proposed methodology is completely generalizable, and does not require an offline step or human domain knowledge to build a model for the hot water vessel or the heating element. Occupant preferences too are learnt on the fly. The proposed system is applied to a set of 32 houses in the Netherlands where it reduces energy consumption for hot water production by roughly 20% with no loss of occupant comfort. Extrapolating, this translates to absolute savings of roughly 200 kWh for a single household on an annual basis. This performance can be replicated to any domestic hot water system and optimization objective, given that the fairly minimal requirements on sensor data are met. With millions of hot water systems operational worldwide, the proposed framework has the potential to reduce energy consumption in existing and new systems on a multi Gigawatt-hour scale in the years to come

arXiv.org e-Print Archive

A Brief Survey of Deep Reinforcement Learning

Author: Arulkumaran Kai
Bharath Anil Anthony
Brundage Miles
Deisenroth Marc Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/09/2017
Field of study

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep

Q

-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.Comment: IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image Understanding (arXiv extended version

arXiv.org e-Print Archive