1 research outputs found
MEDOE: A Multi-Expert Decoder and Output Ensemble Framework for Long-tailed Semantic Segmentation
Long-tailed distribution of semantic categories, which has been often ignored
in conventional methods, causes unsatisfactory performance in semantic
segmentation on tail categories. In this paper, we focus on the problem of
long-tailed semantic segmentation. Although some long-tailed recognition
methods (e.g., re-sampling/re-weighting) have been proposed in other problems,
they can probably compromise crucial contextual information and are thus hardly
adaptable to the problem of long-tailed semantic segmentation. To address this
issue, we propose MEDOE, a novel framework for long-tailed semantic
segmentation via contextual information ensemble-and-grouping. The proposed
two-sage framework comprises a multi-expert decoder (MED) and a multi-expert
output ensemble (MOE). Specifically, the MED includes several "experts". Based
on the pixel frequency distribution, each expert takes the dataset masked
according to the specific categories as input and generates contextual
information self-adaptively for classification; The MOE adopts learnable
decision weights for the ensemble of the experts' outputs. As a model-agnostic
framework, our MEDOE can be flexibly and efficiently coupled with various
popular deep neural networks (e.g., DeepLabv3+, OCRNet, and PSPNet) to improve
their performance in long-tailed semantic segmentation. Experimental results
show that the proposed framework outperforms the current methods on both
Cityscapes and ADE20K datasets by up to 1.78% in mIoU and 5.89% in mAcc.Comment: 18 pages, 9 figure