Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model
  with Proxy

Hu, Jie; Li, Zhiyu; Sun, Chuxiong; Tang, Bo; Wei, Wenqiang; Xiong, Feiyu; yang, Mingchuan; Yang, Wenfei; Zhang, Shifeng; Zhang, Tianzhu; Zhu, Yu

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Authors: Jie Hu
Zhiyu Li
Chuxiong Sun
Bo Tang
Wenqiang Wei
Feiyu Xiong
Mingchuan yang
Wenfei Yang
Shifeng Zhang
Tianzhu Zhang
Yu Zhu
Publication date: 7 March 2024
Publisher

Abstract

Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment processes of LLMs, achieving alignment with human values at a much lower computational cost. We start with a novel Markov Decision Process (MDP) designed for the alignment process and employ Reinforcement Learning (RL) to train a streamlined proxy model that oversees the token generation of the LLM, without altering the LLM itself. Experiments show that our method achieves a comparable level of alignment with only 1\% of the training parameters of other methods

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2403.04283

Last time updated on 26/09/2024