Search CORE

54,597 research outputs found

Deep reinforcement learning from human preferences

Author: Amodei Dario
Brown Tom B.
Christiano Paul
Legg Shane
Leike Jan
Martic Miljan
Publication venue
Publication date: 13/07/2017
Field of study

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback

arXiv.org e-Print Archive

Can Differentiable Decision Trees Learn Interpretable Reward Functions?

Author: Brown Daniel S.
Kalra Akansha
Publication venue
Publication date: 22/06/2023
Field of study

There is an increasing interest in learning reward functions that model human intent and human preferences. However, many frameworks use blackbox learning methods that, while expressive, are difficult to interpret. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs) for both low- and high-dimensional state inputs. We explore and discuss the viability of learning interpretable reward functions using DDTs by evaluating our algorithm on Cartpole, Visual Gridworld environments, and Atari games. We provide evidence that that the tree structure of our learned reward function is useful in determining the extent to which a reward function is aligned with human preferences. We visualize the learned reward DDTs and find that they are capable of learning interpretable reward functions but that the discrete nature of the trees hurts the performance of reinforcement learning at test time. However, we also show evidence that using soft outputs (averaged over all leaf nodes) results in competitive performance when compared with larger capacity deep neural network reward functions

arXiv.org e-Print Archive

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

Author: Duan Zhiyao
Jiang Nan
Jin Sheng
Zhang Changshui
Publication venue
Publication date: 07/02/2020
Field of study

This paper presents a deep reinforcement learning algorithm for online accompaniment generation, with potential for real-time interactive human-machine duet improvisation. Different from offline music generation and harmonization, online music accompaniment requires the algorithm to respond to human input and generate the machine counterpart in a sequential order. We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state). The key of this algorithm is the well-functioning reward model. Instead of defining it using music composition rules, we learn this model from monophonic and polyphonic training data. This model considers the compatibility of the machine-generated note with both the machine-generated context and the human-generated context. Experiments show that this algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part. Subjective evaluations on preferences show that the proposed algorithm generates music pieces of higher quality than the baseline method

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Reinforcement Learning and Advanced Reinforcement Learning to Improve Autonomous Vehicle Planning

Author: Agrawal Avinash. J.
Parati Namita
Potnurwar Archana V.
Satav Pravin R.
Thakur Uma Patel
Welekar Rashmi R.
Publication venue: Auricle Global Society of Education and Research
Publication date: 25/07/2023
Field of study

Planning for autonomous vehicles is a challenging process that involves navigating through dynamic and unpredictable surroundings while making judgments in real-time. Traditional planning methods sometimes rely on predetermined rules or customized heuristics, which could not generalize well to various driving conditions. In this article, we provide a unique framework to enhance autonomous vehicle planning by fusing conventional RL methods with cutting-edge reinforcement learning techniques. To handle many elements of planning issues, our system integrates cutting-edge algorithms including deep reinforcement learning, hierarchical reinforcement learning, and meta-learning. Our framework helps autonomous vehicles make decisions that are more reliable and effective by utilizing the advantages of these cutting-edge strategies.With the use of the RLTT technique, an autonomous vehicle can learn about the intentions and preferences of human drivers by inferring the underlying reward function from expert behaviour that has been seen. The autonomous car can make safer and more human-like decisions by learning from expert demonstrations about the fundamental goals and limitations of driving. Large-scale simulations and practical experiments can be carried out to gauge the effectiveness of the suggested approach. On the basis of parameters like safety, effectiveness, and human likeness, the autonomous vehicle planning system's performance can be assessed. The outcomes of these assessments can help to inform future developments and offer insightful information about the strengths and weaknesses of the strategy

International Journal on Recent and Innovation Trends in Computing and Communication

Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds

Author: Ashley Haines (4342138)
David Gauthier (1893841)
David Taylor (140886)
Donald Landers (4342165)
Hamish Small (4342156)
Jeffrey Shields (4342159)
John Hoenig (4342153)
John Swenarton (4342132)
Mark Matsche (4342147)
Matthew Smith (326340)
Maya Groner (459554)
Philip Sadler (4342126)
Roger Pradel (111017)
Rémi Choquet (111021)
Wolfgang Vogelbein (4342162)
Publication venue
Publication date: 14/06/2017
Field of study

We describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the confidence and consistency of human feedback. This enables deep reinforcement learning algorithms to determine the most appropriate time to listen to the human feedback, exploit the current policy model, or explore the agent's environment. Managing the trade-off between these three strategies allows DRL agents to be robust to inconsistent or intermittent human feedback. Through experimentation using a synthetic oracle, we show that our technique improves the training speed and overall performance of deep reinforcement learning in navigating three-dimensional environments using Minecraft. We further show that our technique is robust to highly innacurate human feedback and can also operate when no human feedback is given

arXiv.org e-Print Archive

Dryad Digital Repository (Duke University)

FigShare