In continual learning, catastrophic forgetting is affected by multiple
aspects of the tasks. Previous works have analyzed separately how forgetting is
affected by either task similarity or overparameterization. In contrast, our
paper examines how task similarity and overparameterization jointly affect
forgetting in an analyzable model. Specifically, we focus on two-task continual
linear regression, where the second task is a random orthogonal transformation
of an arbitrary first task (an abstraction of random permutation tasks). We
derive an exact analytical expression for the expected forgetting - and uncover
a nuanced pattern. In highly overparameterized models, intermediate task
similarity causes the most forgetting. However, near the interpolation
threshold, forgetting decreases monotonically with the expected task
similarity. We validate our findings with linear regression on synthetic data,
and with neural networks on established permutation task benchmarks.Comment: Accepted to the Twelfth International Conference on Learning
Representations (ICLR 2024