Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Bukharin, Alexander; Chen, Weizhu; He, Pengcheng; Li, Yixiao; Zhao, Tuo

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Authors: Alexander Bukharin
Weizhu Chen
Pengcheng He
Yixiao Li
Tuo Zhao
Publication date: 5 September 2023
Publisher

Abstract

Reward design is a fundamental, yet challenging aspect of practical reinforcement learning (RL). For simple tasks, researchers typically handcraft the reward function, e.g., using a linear combination of several reward factors. However, such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data. In light of this cost, we investigate learning reward functions for complex tasks with less human effort; simply by ranking the importance of the reward factors. More specifically, we propose a new RL framework -- HERON, which compares trajectories using a hierarchical decision tree induced by the given ranking. These comparisons are used to train a preference-based reward model, which is then used for policy learning. We find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at https://github.com/abukharin3/HERON.Comment: 28 Pages, 15 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2309.02632

Last time updated on 12/09/2023