1 research outputs found
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
We focus on prediction problems with structured outputs that are subject to
output validity constraints, e.g. pseudocode-to-code translation where the code
must compile. While labeled input-output pairs are expensive to obtain,
"unlabeled" outputs, i.e. outputs without corresponding inputs, are freely
available (e.g. code on GitHub) and provide information about output validity.
Pre-training captures this structure by training a denoiser to denoise
corrupted versions of unlabeled outputs. We first show that standard
fine-tuning after pre-training destroys some of this structure. We then propose
composed fine-tuning, which trains a predictor composed with the pre-trained
denoiser. Importantly, the denoiser is fixed to preserve output structure. Like
standard fine-tuning, the predictor is also initialized with the pre-trained
denoiser. We prove for two-layer ReLU networks that composed fine-tuning
significantly reduces the complexity of the predictor, thus improving
generalization. Empirically, we show that composed fine-tuning improves over
standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6%
relative). The improvement is magnified on out-of-distribution (OOD) examples
(4% and 25% relative), suggesting that reducing predictor complexity improves
OOD extrapolation.Comment: ICML 202