2 research outputs found
GACE: Geometry Aware Confidence Enhancement for Black-Box 3D Object Detectors on LiDAR-Data
Widely-used LiDAR-based 3D object detectors often neglect fundamental
geometric information readily available from the object proposals in their
confidence estimation. This is mostly due to architectural design choices,
which were often adopted from the 2D image domain, where geometric context is
rarely available. In 3D, however, considering the object properties and its
surroundings in a holistic way is important to distinguish between true and
false positive detections, e.g. occluded pedestrians in a group. To address
this, we present GACE, an intuitive and highly efficient method to improve the
confidence estimation of a given black-box 3D object detector. We aggregate
geometric cues of detections and their spatial relationships, which enables us
to properly assess their plausibility and consequently, improve the confidence
estimation. This leads to consistent performance gains over a variety of
state-of-the-art detectors. Across all evaluated detectors, GACE proves to be
especially beneficial for the vulnerable road user classes, i.e. pedestrians
and cyclists.Comment: ICCV 2023, code is available at https://github.com/dschinagl/gac
MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds
The sensing process of large-scale LiDAR point clouds inevitably causes large
blind spots, i.e. regions not visible to the sensor. We demonstrate how these
inherent sampling properties can be effectively utilized for self-supervised
representation learning by designing a highly effective pre-training framework
that considerably reduces the need for tedious 3D annotations to train
state-of-the-art object detectors. Our Masked AutoEncoder for LiDAR point
clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both
the encoder and decoder during reconstruction. This results in more expressive
and useful initialization, which can be directly applied to downstream
perception tasks, such as 3D object detection or semantic segmentation for
autonomous driving. In a novel reconstruction approach, MAELi distinguishes
between empty and occluded space and employs a new masking strategy that
targets the LiDAR's inherent spherical projection. Thereby, without any ground
truth whatsoever and trained on single frames only, MAELi obtains an
understanding of the underlying 3D scene geometry and semantics. To demonstrate
the potential of MAELi, we pre-train backbones in an end-to-end manner and show
the effectiveness of our unsupervised pre-trained weights on the tasks of 3D
object detection and semantic segmentation.Comment: 16 page