The Class Incremental Semantic Segmentation (CISS) extends the traditional
segmentation task by incrementally learning newly added classes. Previous work
has introduced generative replay, which involves replaying old class samples
generated from a pre-trained GAN, to address the issues of catastrophic
forgetting and privacy concerns. However, the generated images lack semantic
precision and exhibit out-of-distribution characteristics, resulting in
inaccurate masks that further degrade the segmentation performance. To tackle
these challenges, we propose DiffusePast, a novel framework featuring a
diffusion-based generative replay module that generates semantically accurate
images with more reliable masks guided by different instructions (e.g., text
prompts or edge maps). Specifically, DiffusePast introduces a dual-generator
paradigm, which focuses on generating old class images that align with the
distribution of downstream datasets while preserving the structure and layout
of the original images, enabling more precise masks. To adapt to the novel
visual concepts of newly added classes continuously, we incorporate class-wise
token embedding when updating the dual-generator. Moreover, we assign adequate
pseudo-labels of old classes to the background pixels in the new step images,
further mitigating the forgetting of previously learned knowledge. Through
comprehensive experiments, our method demonstrates competitive performance
across mainstream benchmarks, striking a better balance between the performance
of old and novel classes.Comment: e.g.: 13 pages, 7 figure