Query-based object detectors have made significant advancements since the
publication of DETR. However, most existing methods still rely on multi-stage
encoders and decoders, or a combination of both. Despite achieving high
accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers
from issues such as heavy computational burden, prompting us to reconsider its
necessity. In this paper, we explore multiple techniques to enhance query-based
detectors and, based on these findings, propose a novel model called GOLO
(Global Once and Local Once), which follows a two-stage decoding paradigm.
Compared to other mainstream query-based models with multi-stage decoders, our
model employs fewer decoder stages while still achieving considerable
performance. Experimental results on the COCO dataset demonstrate the
effectiveness of our approach