When some application scenarios need to use semantic segmentation technology,
like automatic driving, the primary concern comes to real-time performance
rather than extremely high segmentation accuracy. To achieve a good trade-off
between speed and accuracy, two-branch architecture has been proposed in recent
years. It treats spatial information and semantics information separately which
allows the model to be composed of two networks both not heavy. However, the
process of fusing features with two different scales becomes a performance
bottleneck for many nowaday two-branch models. In this research, we design a
new fusion mechanism for two-branch architecture which is guided by attention
computation. To be precise, we use the Dual-Guided Attention (DGA) module we
proposed to replace some multi-scale transformations with the calculation of
attention which means we only use several attention layers of near linear
complexity to achieve performance comparable to frequently-used multi-layer
fusion. To ensure that our module can be effective, we use Residual U-blocks
(RSU) to build one of the two branches in our networks which aims to obtain
better multi-scale features. Extensive experiments on Cityscapes and CamVid
dataset show the effectiveness of our method