Despite the great success of pre-trained language models, it is still a
challenge to use these models for continual learning, especially for the
class-incremental learning (CIL) setting due to catastrophic forgetting (CF).
This paper reports our finding that if we formulate CIL as a continual label
generation problem, CF is drastically reduced and the generalizable
representations of pre-trained models can be better retained. We thus propose a
new CIL method (VAG) that also leverages the sparsity of vocabulary to focus
the generation and creates pseudo-replay samples by using label semantics.
Experimental results show that VAG outperforms baselines by a large margin.Comment: 12 pages, ACL 2023 Main Conferenc