Tuning Models of Code with Compiler-Generated Reinforcement Learning
  Feedback

Adiole, Chima; Chaudhuri, Swarat; Jain, Abhinav; Jermaine, Chris; Reps, Thomas

Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback

Authors: Chima Adiole
Swarat Chaudhuri
Abhinav Jain
Chris Jermaine
Thomas Reps
Publication date: 25 May 2023
Publisher

Abstract

Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, the code that these models produce can violate basic language-level invariants, leading to lower performance in downstream tasks. We address this issue through an approach, called RLCF, that further trains a pre-trained LLM using feedback from a code compiler. RLCF views the LLM as an RL agent that generates code step by step and receives: (i) compiler-derived feedback on whether the code it generates passes a set of correctness checks; and (ii) feedback from a different LLM on whether the generated code is similar to a set of reference programs in the training corpus. Together, these feedback mechanisms help the generated code remain within the target distribution while passing all static correctness checks. RLCF is model- and language-agnostic. We empirically evaluate it on the MBJP and MathQA tasks for Java. Our experiments show that RLCF significantly raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.Comment: 19 pages, 3 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.18341

Last time updated on 02/06/2023