Controlled Decoding from Language Models

Beirami, Ahmad; Beutel, Alex; Chen, Jilin; Chen, Zhifeng; Cheng, Heng-Tze; Collins, Michael; Ganapathy, Harish; Huang, Yanping; Lee, Jong; Li, YaGuang; Mudgal, Sidharth; Strohman, Trevor; Wang, Tao

Controlled Decoding from Language Models

Authors: Ahmad Beirami
Alex Beutel
Jilin Chen
Zhifeng Chen
Heng-Tze Cheng
Michael Collins
Harish Ganapathy
Yanping Huang
Jong Lee
YaGuang Li
Sidharth Mudgal
Trevor Strohman
Tao Wang
Publication date: 25 October 2023
Publisher

Abstract

We propose controlled decoding (CD), a novel off-policy reinforcement learning method to control the autoregressive generation from language models towards high reward outcomes. CD solves an off-policy reinforcement learning problem through a value function for the reward, which we call a prefix scorer. The prefix scorer is used at inference time to steer the generation towards higher reward outcomes. We show that the prefix scorer may be trained on (possibly) off-policy data to predict the expected reward when decoding is continued from a partially decoded response. We empirically demonstrate that CD is effective as a control mechanism on Reddit conversations corpus. We also show that the modularity of the design of CD makes it possible to control for multiple rewards, effectively solving a multi-objective reinforcement learning problem with no additional complexity. Finally, we show that CD can be applied in a novel blockwise fashion at inference-time, again without the need for any training-time changes, essentially bridging the gap between the popular best-of-

K

strategy and token-level reinforcement learning. This makes CD a promising approach for alignment of language models

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.17022

Last time updated on 16/01/2024