Personalized Soups: Personalized Large Language Model Alignment via
  Post-hoc Parameter Merging

Ammanabrolu, Prithviraj; Choi, Yejin; Hajishirzi, Hannaneh; Hessel, Jack; Jang, Joel; Kim, Seungone; Lin, Bill Yuchen; Wang, Yizhong; Zettlemoyer, Luke

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

Authors: Prithviraj Ammanabrolu
Yejin Choi
Hannaneh Hajishirzi
Jack Hessel
Joel Jang
Seungone Kim
Bill Yuchen Lin
Yizhong Wang
Luke Zettlemoyer
Publication date: 17 October 2023
Publisher

Abstract

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong single-objective baselines, we show that we can achieve personalized alignment by decomposing preferences into multiple dimensions. These dimensions are defined based on personalizations that are declared as desirable by the user. In this work, we show that they can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging. The code is available at https://github.com/joeljang/RLPHF.Comment: Preprin

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.11564

Last time updated on 06/01/2024