Efficient Semantic Segmentation via Self-Attention and Self-Distillation

Abstract

Lightweight models are pivotal in efficient semantic segmentation, but they often suffer from insufficient context information due to limited convolution and small receptive field. To address this problem, we propose a tailored approach to efficient semantic segmentation by leveraging two complementary distillation schemes for supplementing context information to small networks: 1) a self-attention distillation scheme, which transfers long-range context knowledge adaptively from large teacher networks to small student networks; and 2) a layer-wise context distillation scheme, which transfers structured context from deep layers to shallow layers within student networks for promoting semantic consistency of the shallow layers. Extensive experiments on the ADE20K, Cityscapes, and Camvid datasets well demonstrate the effectiveness of our proposal

    Similar works