Class Incremental Semantic Segmentation (CISS) has been a trend recently due
to its great significance in real-world applications. Although the existing
CISS methods demonstrate remarkable performance, they either leverage the
high-level knowledge (feature) only while neglecting the rich and diverse
knowledge in the low-level features, leading to poor old knowledge preservation
and weak new knowledge exploration; or use multi-level features for knowledge
distillation by retraining a heavy backbone, which is computationally
intensive. In this paper, we for the first time propose to efficiently reuse
the multi-grained knowledge for CISS by fusing multi-level features with the
frozen backbone and show a simple aggregation of varying-level features, i.e.,
naive feature pyramid, can boost the performance significantly. We further
introduce a novel densely-interactive feature pyramid (DEFY) module that
enhances the fusion of high- and low-level features by enabling their dense
interaction. Specifically, DEFY establishes a per-pixel relationship between
pairs of feature maps, allowing for multi-pair outputs to be aggregated. This
results in improved semantic segmentation by leveraging the complementary
information from multi-level features. We show that DEFY can be effortlessly
integrated into three representative methods for performance enhancement. Our
method yields a new state-of-the-art performance when combined with the current
SOTA by notably averaged mIoU gains on two widely used benchmarks, i.e., 2.5%
on PASCAL VOC 2012 and 2.3% on ADE20K.Comment: Technical Report. This work has been submitted to the IEEE for
possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl