Object detection on VHR remote sensing images plays a vital role in
applications such as urban planning, land resource management, and rescue
missions. The large-scale variation of the remote-sensing targets is one of the
main challenges in VHR remote-sensing object detection. Existing methods
improve the detection accuracy of high-resolution remote sensing objects by
improving the structure of feature pyramids and adopting different attention
modules. However, for small targets, there still be seriously missed detections
due to the loss of key detail features. There is still room for improvement in
the way of multiscale feature fusion and balance. To address this issue, this
paper proposes two novel modules: Guided Attention and Tucker Bilinear
Attention, which are applied to the stages of early fusion and late fusion
respectively. The former can effectively retain clean key detail features, and
the latter can better balance features through semantic-level correlation
mining. Based on two modules, we build a new multi-scale remote sensing object
detection framework. No bells and whistles. The proposed method largely
improves the average precisions of small objects and achieves the highest mean
average precisions compared with 9 state-of-the-art methods on DOTA, DIOR, and
NWPU VHR-10.Code and models are available at
https://github.com/Shinichict/GTNet.Comment: arXiv admin note: text overlap with arXiv:1705.06676,
arXiv:2209.13351 by other author