Bridging the Gap: Analyzing Methods to Improve Temporal Generalizability of Large Language Models

Abstract

Especially with the advent of publicly available, interactive Large Language Models like ChatGPT, the ability of massive pretrained networks to mimic, assist, or even replace human communication has become widely recognized. However, the fact that these representations encode information once, at training time, has been noted as a key issue for improving these models. The single instance of training means these models often struggle to represent text some time away from when the model was created, or to maintain an understanding of malleable factual information. With these issues in mind, we consider dynamic evaluation and k-nearest-neighbor language modeling as potential methods to improve temporal generalization, testing them with the same model on identical tasks in order to facilitate comparison. We then propose some novel model structures that apply these modifications in conjunction to further improve performance. We observe improvements to language-modeling perplexity in scientific and news datasets through all of our modifications. The continuous dynamic evaluation model can improve perplexity by more than 10% in a testing period four years from training time, reducing or even reversing the rate of temporal decay. We observe similar performance benefits in a k-Nearest-Neighbor Language model, as well as in our novel model formulations

    Similar works

    Full text

    thumbnail-image

    Available Versions