Search CORE

55 research outputs found

Cycle-of-Learning for Autonomous Systems from Human Interaction

Author: Goecks Vinicius G.
Lawhern Vernon J.
Waytowich Nicholas R.
Publication venue
Publication date: 09/10/2018
Field of study

We discuss different types of human-robot interaction paradigms in the context of training end-to-end reinforcement learning algorithms. We provide a taxonomy to categorize the types of human interaction and present our Cycle-of-Learning framework for autonomous systems that combines different human-interaction modalities with reinforcement learning. Two key concepts provided by our Cycle-of-Learning framework are how it handles the integration of the different human-interaction modalities (demonstration, intervention, and evaluation) and how to define the switching criteria between them.Comment: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606

arXiv.org e-Print Archive

Multi-Preference Actor Critic

Author: Durugkar Ishan
Hausknecht Matthew
MacAlpine Patrick
Swaminathan Adith
Publication venue
Publication date: 05/04/2019
Field of study

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates. However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning. We introduce a general method to incorporate multiple different feedback channels into a single policy gradient loss. In our formulation, the Multi-Preference Actor Critic (M-PAC), these different types of feedback are implemented as constraints on the policy. We use a Lagrangian relaxation to satisfy these constraints using gradient descent while learning a policy that maximizes rewards. Experiments in Atari and Pendulum verify that constraints are being respected and can accelerate the learning process.Comment: NeurIPS Workshop on Deep RL, 201

arXiv.org e-Print Archive

Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

Author: Brys Tim
Nowé Ann
Plisnier Hélène
Roijers Diederik M.
Steckelmacher Denis
Publication venue
Publication date: 13/08/2018
Field of study

Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot immediately prevent it from wrong-doing. In this paper, we extend Policy Gradient to make it robust to external directives, that would otherwise break the fundamentally on-policy nature of Policy Gradient. Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster. Our experiments demonstrate that DPG makes the agent learn much faster than reward-based approaches, while requiring an order of magnitude less advice.Comment: Accepted at the European Workshop on Reinforcement Learning 2018 (EWRL14

arXiv.org e-Print Archive

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Author: Arakawa Riku
Kobayashi Sosuke
Maeda Shin-ichi
Tsuboi Yuta
Unno Yuya
Publication venue
Publication date: 27/10/2018
Field of study

Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze

arXiv.org e-Print Archive

Extending Policy from One-Shot Learning through Coaching

Author: Balakuntala Mythra V.
Bindu Jyothsna Padmakumar
Venkatesh Vishnunandan L. N.
Voyles Richard M.
Wachs Juan
Publication venue
Publication date: 12/05/2019
Field of study

Humans generally teach their fellow collaborators to perform tasks through a small number of demonstrations. The learnt task is corrected or extended to meet specific task goals by means of coaching. Adopting a similar framework for teaching robots through demonstrations and coaching makes teaching tasks highly intuitive. Unlike traditional Learning from Demonstration (LfD) approaches which require multiple demonstrations, we present a one-shot learning from demonstration approach to learn tasks. The learnt task is corrected and generalized using two layers of evaluation/modification. First, the robot self-evaluates its performance and corrects the performance to be closer to the demonstrated task. Then, coaching is used as a means to extend the policy learnt to be adaptable to varying task goals. Both the self-evaluation and coaching are implemented using reinforcement learning (RL) methods. Coaching is achieved through human feedback on desired goal and action modification to generalize to specified task goals. The proposed approach is evaluated with a scooping task, by presenting a single demonstration. The self-evaluation framework aims to reduce the resistance to scooping in the media. To reduce the search space for RL, we bootstrap the search using least resistance path obtained using resistive force theory. Coaching is used to generalize the learnt task policy to transfer the desired quantity of material. Thus, the proposed method provides a framework for learning tasks from one demonstration and generalizing it using human feedback through coaching

arXiv.org e-Print Archive

Risk-Aware Active Inverse Reinforcement Learning

Author: Brown Daniel S.
Cui Yuchen
Niekum Scott
Publication venue
Publication date: 03/06/2019
Field of study

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learning, we propose a risk-aware active inverse reinforcement learning algorithm that focuses active queries on areas of the state space with the potential for large generalization error. We show that risk-aware active learning outperforms standard active IRL approaches on gridworld, simulated driving, and table setting tasks, while also providing a performance-based stopping criterion that allows a robot to know when it has received enough demonstrations to safely perform a task.Comment: In proceedings of the 2nd Conference on Robot Learning (CoRL) 201

arXiv.org e-Print Archive

Robot Learning via Human Adversarial Games

Author: Duan Jiali
Kuo C. -C. Jay
Nikolaidis Stefanos
Pinto Lerrel
Wang Qian
Publication venue
Publication date: 07/12/2020
Field of study

Much work in robotics has focused on "human-in-the-loop" learning techniques that improve the efficiency of the learning process. However, these algorithms have made the strong assumption of a cooperating human supervisor that assists the robot. In reality, human observers tend to also act in an adversarial manner towards deployed robotic systems. We show that this can in fact improve the robustness of the learned models by proposing a physical framework that leverages perturbations applied by a human adversary, guiding the robot towards more robust models. In a manipulation task, we show that grasping success improves significantly when the robot trains with a human adversary as compared to training in a self-supervised manner

arXiv.org e-Print Archive

Improving Interactive Reinforcement Agent Planning with Human Demonstration

Author: Gomez Randy
He Bo
Li Guangliang
Lin Jinying
Nakamura Keisuke
Zhang Qilei
Publication venue
Publication date: 18/04/2019
Field of study

TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents' behavior by providing evaluative feedback. However, a TAMER agent planning with UCT---a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we propose to drive the agent's exploration along the optimal path and reduce the learning cost by initializing the agent's reward function via inverse reinforcement learning from demonstration. We test our proposed method in the RL benchmark domain---Grid World---with different discounts on human reward. Our results show that learning from demonstration can allow a TAMER agent to learn a roughly optimal policy up to the deepest search and encourage the agent to explore along the optimal path. In addition, we find that learning from demonstration can improve the learning efficiency by reducing total feedback, the number of incorrect actions and increasing the ratio of correct actions to obtain an optimal policy, allowing a TAMER agent to converge faster

arXiv.org e-Print Archive

Actor-Critic Reinforcement Learning with Simultaneous Human Control and Feedback

Author: Mathewson Kory W.
Pilarski Patrick M.
Publication venue
Publication date: 15/03/2017
Field of study

This paper contributes a first study into how different human users deliver simultaneous control and feedback signals during human-robot interaction. As part of this work, we formalize and present a general interactive learning framework for online cooperation between humans and reinforcement learning agents. In many human-machine interaction settings, there is a growing gap between the degrees-of-freedom of complex semi-autonomous systems and the number of human control channels. Simple human control and feedback mechanisms are required to close this gap and allow for better collaboration between humans and machines on complex tasks. To better inform the design of concurrent control and feedback interfaces, we present experimental results from a human-robot collaborative domain wherein the human must simultaneously deliver both control and feedback signals to interactively train an actor-critic reinforcement learning robot. We compare three experimental conditions: 1) human delivered control signals, 2) reward-shaping feedback signals, and 3) simultaneous control and feedback. Our results suggest that subjects provide less feedback when simultaneously delivering feedback and control signals and that control signal quality is not significantly diminished. Our data suggest that subjects may also modify when and how they provide feedback. Through algorithmic development and tuning informed by this study, we expect semi-autonomous actions of robotic agents can be better shaped by human feedback, allowing for seamless collaboration and improved performance in difficult interactive domains.Comment: 10 pages, 2 pages of references, 8 figures. Under review for the 34th International Conference on Machine Learning, Sydney, Australia, 2017. Copyright 2017 by the author

arXiv.org e-Print Archive

Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

Author: Li Guangliang
wang Dongxu
Yang Tianpei
Yu Chao
Zhu Wenxuan
Publication venue
Publication date: 10/11/2018
Field of study

Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning

arXiv.org e-Print Archive