2,319 research outputs found

    Proximal Policy Optimization with Relative Pearson Divergence

    Full text link
    The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of policy for stable and efficient learning. A popular method, named proximal policy optimization (PPO), has been introduced for this purpose. PPO clips density ratio of the latest and baseline policies with a threshold, while its minimization target is unclear. As another problem of PPO, the symmetric threshold is given numerically while the density ratio itself is in asymmetric domain, thereby causing unbalanced regularization of the policy. This paper therefore proposes a new variant of PPO by considering a regularization problem of relative Pearson (RPE) divergence, so-called PPO-RPE. This regularization yields the clear minimization target, which constrains the latest policy to the baseline one. Through its analysis, the intuitive threshold-based design consistent with the asymmetry of the threshold and the domain of density ratio can be derived. Through four benchmark tasks, PPO-RPE performed as well as or better than the conventional methods in terms of the task performance by the learned policy.Comment: 6 pages, 5 figures (accepted for ICRA2021

    A Review on Robot Manipulation Methods in Human-Robot Interactions

    Full text link
    Robot manipulation is an important part of human-robot interaction technology. However, traditional pre-programmed methods can only accomplish simple and repetitive tasks. To enable effective communication between robots and humans, and to predict and adapt to uncertain environments, this paper reviews recent autonomous and adaptive learning in robotic manipulation algorithms. It includes typical applications and challenges of human-robot interaction, fundamental tasks of robot manipulation and one of the most widely used formulations of robot manipulation, Markov Decision Process. Recent research focusing on robot manipulation is mainly based on Reinforcement Learning and Imitation Learning. This review paper shows the importance of Deep Reinforcement Learning, which plays an important role in manipulating robots to complete complex tasks in disturbed and unfamiliar environments. With the introduction of Imitation Learning, it is possible for robot manipulation to get rid of reward function design and achieve a simple, stable and supervised learning process. This paper reviews and compares the main features and popular algorithms for both Reinforcement Learning and Imitation Learning

    Advances in Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

    Student Behavior Simulation in English Online Education Based on Reinforcement Learning

    Get PDF
    In class, every student's action is not the same. In this era, most courses are taken online; tracking and identifying students’ behavior is a significant challenge, especially in language classes (English). In this study, Student Behaviors’ Simulation-Based on Reinforcement Learning Framework (SBS–BRLF) has been proposed to track and identify students’ online class behavior. The simulation model is generated with various trained sets of behavior that are categorized as positive and negative with Reinforcement Learning (RL). Reinforcement learning (RL) is a field of machine learning dealing with how intelligent agents act in an environment for cumulative rewards. With a web camera and microphone, the students are tracked in the simulation model, and collected data is executed with RL’s aid. If the action is assessed as good, the pupil is praised, or given a warning three times, and then, if repeated, suspended for a day. Hence, the pupil is monitored easily without complications. The research and comparative analysis of the proposed and the current framework have proved that SBSBRLF works efficiently and accurately with the behavioral rate of 93.2%, the performance rate of 96%, supervision rate of 92%, reliability rate of 89.7 % for students, and a higher action and reward acceptance rate of 89.9 %
    • …
    corecore