Search CORE

2,319 research outputs found

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California

Proximal Policy Optimization with Relative Pearson Divergence

Author: Kobayashi Taisuke
Publication venue
Publication date: 15/03/2021
Field of study

The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of policy for stable and efficient learning. A popular method, named proximal policy optimization (PPO), has been introduced for this purpose. PPO clips density ratio of the latest and baseline policies with a threshold, while its minimization target is unclear. As another problem of PPO, the symmetric threshold is given numerically while the density ratio itself is in asymmetric domain, thereby causing unbalanced regularization of the policy. This paper therefore proposes a new variant of PPO by considering a regularization problem of relative Pearson (RPE) divergence, so-called PPO-RPE. This regularization yields the clear minimization target, which constrains the latest policy to the baseline one. Through its analysis, the intuitive threshold-based design consistent with the asymmetry of the threshold and the domain of density ratio can be derived. Through four benchmark tasks, PPO-RPE performed as well as or better than the conventional methods in terms of the task performance by the learned policy.Comment: 6 pages, 5 figures (accepted for ICRA2021

arXiv.org e-Print Archive

A Review on Robot Manipulation Methods in Human-Robot Interactions

Author: Kebria Parham M.
Mohamed Shady
Nahavandi Saeid
Yu Samson
Zhang Haoxu
Publication venue
Publication date: 09/09/2023
Field of study

Robot manipulation is an important part of human-robot interaction technology. However, traditional pre-programmed methods can only accomplish simple and repetitive tasks. To enable effective communication between robots and humans, and to predict and adapt to uncertain environments, this paper reviews recent autonomous and adaptive learning in robotic manipulation algorithms. It includes typical applications and challenges of human-robot interaction, fundamental tasks of robot manipulation and one of the most widely used formulations of robot manipulation, Markov Decision Process. Recent research focusing on robot manipulation is mainly based on Reinforcement Learning and Imitation Learning. This review paper shows the importance of Deep Reinforcement Learning, which plays an important role in manipulating robots to complete complex tasks in disturbed and unfamiliar environments. With the introduction of Imitation Learning, it is possible for robot manipulation to get rid of reward function design and achieve a simple, stable and supervised learning process. This paper reviews and compares the main features and popular algorithms for both Reinforcement Learning and Imitation Learning

arXiv.org e-Print Archive

Advances in Reinforcement Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

Directory of Open Access Books (DOAB)

Student Behavior Simulation in English Online Education Based on Reinforcement Learning

Author: Wenjing Wang
Publication venue: International Federation of Engineering Education Societies (IFEES)
Publication date: 27/11/2023
Field of study

In class, every student's action is not the same. In this era, most courses are taken online; tracking and identifying students’ behavior is a significant challenge, especially in language classes (English). In this study, Student Behaviors’ Simulation-Based on Reinforcement Learning Framework (SBS–BRLF) has been proposed to track and identify students’ online class behavior. The simulation model is generated with various trained sets of behavior that are categorized as positive and negative with Reinforcement Learning (RL). Reinforcement learning (RL) is a field of machine learning dealing with how intelligent agents act in an environment for cumulative rewards. With a web camera and microphone, the students are tracked in the simulation model, and collected data is executed with RL’s aid. If the action is assessed as good, the pupil is praised, or given a warning three times, and then, if repeated, suspended for a day. Hence, the pupil is monitored easily without complications. The research and comparative analysis of the proposed and the current framework have proved that SBSBRLF works efficiently and accurately with the behavioral rate of 93.2%, the performance rate of 96%, supervision rate of 92%, reliability rate of 89.7 % for students, and a higher action and reward acceptance rate of 89.9 %

Online-Journals.org (International Association of Online Engineering)