3D lane detection is an integral part of autonomous driving systems. Previous
CNN and Transformer-based methods usually first generate a bird's-eye-view
(BEV) feature map from the front view image, and then use a sub-network with
BEV feature map as input to predict 3D lanes. Such approaches require an
explicit view transformation between BEV and front view, which itself is still
a challenging problem. In this paper, we propose CurveFormer, a single-stage
Transformer-based method that directly calculates 3D lane parameters and can
circumvent the difficult view transformation step. Specifically, we formulate
3D lane detection as a curve propagation problem by using curve queries. A 3D
lane query is represented by a dynamic and ordered anchor point set. In this
way, queries with curve representation in Transformer decoder iteratively
refine the 3D lane detection results. Moreover, a curve cross-attention module
is introduced to compute the similarities between curve queries and image
features. Additionally, a context sampling module that can capture more
relative image features of a curve query is provided to further boost the 3D
lane detection performance. We evaluate our method for 3D lane detection on
both synthetic and real-world datasets, and the experimental results show that
our method achieves promising performance compared with the state-of-the-art
approaches. The effectiveness of each component is validated via ablation
studies as well