478 research outputs found
A Comprehensive Study and Comparison of the Robustness of 3D Object Detectors Against Adversarial Attacks
Recent years have witnessed significant advancements in deep learning-based
3D object detection, leading to its widespread adoption in numerous
applications. As 3D object detectors become increasingly crucial for
security-critical tasks, it is imperative to understand their robustness
against adversarial attacks. This paper presents the first comprehensive
evaluation and analysis of the robustness of LiDAR-based 3D detectors under
adversarial attacks. Specifically, we extend three distinct adversarial attacks
to the 3D object detection task, benchmarking the robustness of
state-of-the-art LiDAR-based 3D object detectors against attacks on the KITTI
and Waymo datasets. We further analyze the relationship between robustness and
detector properties. Additionally, we explore the transferability of
cross-model, cross-task, and cross-data attacks. Thorough experiments on
defensive strategies for 3D detectors are conducted, demonstrating that simple
transformations like flipping provide little help in improving robustness when
the applied transformation strategy is exposed to attackers. Finally, we
propose balanced adversarial focal training, based on conventional adversarial
training, to strike a balance between accuracy and robustness. Our findings
will facilitate investigations into understanding and defending against
adversarial attacks on LiDAR-based 3D object detectors, thus advancing the
field. The source code is publicly available at
\url{https://github.com/Eaphan/Robust3DOD}.Comment: 30 pages, 14 figure
Learning Light Field Angular Super-Resolution via a Geometry-Aware Network
The acquisition of light field images with high angular resolution is costly.
Although many methods have been proposed to improve the angular resolution of a
sparsely-sampled light field, they always focus on the light field with a small
baseline, which is captured by a consumer light field camera. By making full
use of the intrinsic \textit{geometry} information of light fields, in this
paper we propose an end-to-end learning-based approach aiming at angularly
super-resolving a sparsely-sampled light field with a large baseline. Our model
consists of two learnable modules and a physically-based module. Specifically,
it includes a depth estimation module for explicitly modeling the scene
geometry, a physically-based warping for novel views synthesis, and a light
field blending module specifically designed for light field reconstruction.
Moreover, we introduce a novel loss function to promote the preservation of the
light field parallax structure. Experimental results over various light field
datasets including large baseline light field images demonstrate the
significant superiority of our method when compared with state-of-the-art ones,
i.e., our method improves the PSNR of the second best method up to 2 dB in
average, while saves the execution time 48. In addition, our method
preserves the light field parallax structure better.Comment: This paper was accepted by AAAI 202
Transvaginal fast-scanning optical-resolution photoacoustic endoscopy
Photoacoustic endoscopy offers in vivo examination of the visceral tissue using endogenous contrast, but its typical B-scan rate is ∼10  Hz, restricted by the speed of the scanning unit and the laser pulse repetition rate. Here, we present a transvaginal fast-scanning optical-resolution photoacoustic endoscope with a 250-Hz B-scan rate over a 3-mm scanning range. Using this modality, we not only illustrated the morphological differences of vasculatures among the human ectocervix, uterine body, and sublingual mucosa but also showed the longitudinal and cross-sectional differences of cervical vasculatures in pregnant women. This technology is promising for screening the visceral pathological changes associated with angiogenesis
Bidirectional Propagation for Cross-Modal 3D Object Detection
Recent works have revealed the superiority of feature-level fusion for
cross-modal 3D object detection, where fine-grained feature propagation from 2D
image pixels to 3D LiDAR points has been widely adopted for performance
improvement. Still, the potential of heterogeneous feature propagation between
2D and 3D domains has not been fully explored. In this paper, in contrast to
existing pixel-to-point feature propagation, we investigate an opposite
point-to-pixel direction, allowing point-wise features to flow inversely into
the 2D image branch. Thus, when jointly optimizing the 2D and 3D streams, the
gradients back-propagated from the 2D image branch can boost the representation
ability of the 3D backbone network working on LiDAR point clouds. Then,
combining pixel-to-point and point-to-pixel information flow mechanisms, we
construct an bidirectional feature propagation framework, dubbed BiProDet. In
addition to the architectural design, we also propose normalized local
coordinates map estimation, a new 2D auxiliary task for the training of the 2D
image branch, which facilitates learning local spatial-aware features from the
image modality and implicitly enhances the overall 3D detection performance.
Extensive experiments and ablation studies validate the effectiveness of our
method. Notably, we rank on the highly competitive
KITTI benchmark on the cyclist class by the time of submission. The source code
is available at https://github.com/Eaphan/BiProDet.Comment: Accepted by ICLR2023. Code is avaliable at
https://github.com/Eaphan/BiProDe
GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation
The inherent ambiguity in ground-truth annotations of 3D bounding boxes
caused by occlusions, signal missing, or manual annotation errors can confuse
deep 3D object detectors during training, thus deteriorating the detection
accuracy. However, existing methods overlook such issues to some extent and
treat the labels as deterministic. In this paper, we formulate the label
uncertainty problem as the diversity of potentially plausible bounding boxes of
objects, then propose GLENet, a generative framework adapted from conditional
variational autoencoders, to model the one-to-many relationship between a
typical 3D object and its potential ground-truth bounding boxes with latent
variables. The label uncertainty generated by GLENet is a plug-and-play module
and can be conveniently integrated into existing deep 3D detectors to build
probabilistic detectors and supervise the learning of the localization
uncertainty. Besides, we propose an uncertainty-aware quality estimator
architecture in probabilistic detectors to guide the training of IoU-branch
with predicted localization uncertainty. We incorporate the proposed methods
into various popular base 3D detectors and demonstrate significant and
consistent performance gains on both KITTI and Waymo benchmark datasets.
Especially, the proposed GLENet-VR outperforms all published LiDAR-based
approaches by a large margin and ranks among single-modal methods on
the challenging KITTI test set. We will make the source code and pre-trained
models publicly available
- …