89 research outputs found
Perceptual Generative Adversarial Networks for Small Object Detection
Detecting small objects is notoriously challenging due to their low
resolution and noisy representation. Existing object detection pipelines
usually detect small objects through learning representations of all the
objects at multiple scales. However, the performance gain of such ad hoc
architectures is usually limited to pay off the computational cost. In this
work, we address the small object detection problem by developing a single
architecture that internally lifts representations of small objects to
"super-resolved" ones, achieving similar characteristics as large objects and
thus more discriminative for detection. For this purpose, we propose a new
Perceptual Generative Adversarial Network (Perceptual GAN) model that improves
small object detection through narrowing representation difference of small
objects from the large ones. Specifically, its generator learns to transfer
perceived poor representations of the small objects to super-resolved ones that
are similar enough to real large objects to fool a competing discriminator.
Meanwhile its discriminator competes with the generator to identify the
generated representation and imposes an additional perceptual requirement -
generated representations of small objects must be beneficial for detection
purpose - on the generator. Extensive evaluations on the challenging
Tsinghua-Tencent 100K and the Caltech benchmark well demonstrate the
superiority of Perceptual GAN in detecting small objects, including traffic
signs and pedestrians, over well-established state-of-the-arts
DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection
Hyperspectral salient object detection (HSOD) has exhibited remarkable
promise across various applications, particularly in intricate scenarios where
conventional RGB-based approaches fall short. Despite the considerable progress
in HSOD method advancements, two critical challenges require immediate
attention. Firstly, existing hyperspectral data dimension reduction techniques
incur a loss of spectral information, which adversely affects detection
accuracy. Secondly, previous methods insufficiently harness the inherent
distinctive attributes of hyperspectral images (HSIs) during the feature
extraction process. To address these challenges, we propose a novel approach
termed the Distilled Mixed Spectral-Spatial Network (DMSSN), comprising a
Distilled Spectral Encoding process and a Mixed Spectral-Spatial Transformer
(MSST) feature extraction network. The encoding process utilizes knowledge
distillation to construct a lightweight autoencoder for dimension reduction,
striking a balance between robust encoding capabilities and low computational
costs. The MSST extracts spectral-spatial features through multiple attention
head groups, collaboratively enhancing its resistance to intricate scenarios.
Moreover, we have created a large-scale HSOD dataset, HSOD-BIT, to tackle the
issue of data scarcity in this field and meet the fundamental data requirements
of deep network training. Extensive experiments demonstrate that our proposed
DMSSN achieves state-of-the-art performance on multiple datasets. We will soon
make the code and dataset publicly available on
https://github.com/anonymous0519/HSOD-BIT
FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection
3D object detection with multi-sensors is essential for an accurate and
reliable perception system of autonomous driving and robotics. Existing 3D
detectors significantly improve the accuracy by adopting a two-stage paradigm
which merely relies on LiDAR point clouds for 3D proposal refinement. Though
impressive, the sparsity of point clouds, especially for the points far away,
making it difficult for the LiDAR-only refinement module to accurately
recognize and locate objects.To address this problem, we propose a novel
multi-modality two-stage approach named FusionRCNN, which effectively and
efficiently fuses point clouds and camera images in the Regions of
Interest(RoI). FusionRCNN adaptively integrates both sparse geometry
information from LiDAR and dense texture information from camera in a unified
attention mechanism. Specifically, it first utilizes RoIPooling to obtain an
image set with a unified size and gets the point set by sampling raw points
within proposals in the RoI extraction step; then leverages an intra-modality
self-attention to enhance the domain-specific features, following by a
well-designed cross-attention to fuse the information from two
modalities.FusionRCNN is fundamentally plug-and-play and supports different
one-stage methods with almost no architectural changes. Extensive experiments
on KITTI and Waymo benchmarks demonstrate that our method significantly boosts
the performances of popular detectors.Remarkably, FusionRCNN significantly
improves the strong SECOND baseline by 6.14% mAP on Waymo, and outperforms
competing two-stage approaches. Code will be released soon at
https://github.com/xxlbigbrother/Fusion-RCNN.Comment: 7 pages, 3 figure
Dynamic Loss For Robust Learning
Label noise and class imbalance commonly coexist in real-world data. Previous
works for robust learning, however, usually address either one type of the data
biases and underperform when facing them both. To mitigate this gap, this work
presents a novel meta-learning based dynamic loss that automatically adjusts
the objective functions with the training process to robustly learn a
classifier from long-tailed noisy data. Concretely, our dynamic loss comprises
a label corrector and a margin generator, which respectively correct noisy
labels and generate additive per-class classification margins by perceiving the
underlying data distribution as well as the learning state of the classifier.
Equipped with a new hierarchical sampling strategy that enriches a small amount
of unbiased metadata with diverse and hard samples, the two components in the
dynamic loss are optimized jointly through meta-learning and cultivate the
classifier to well adapt to clean and balanced test data. Extensive experiments
show our method achieves state-of-the-art accuracy on multiple real-world and
synthetic datasets with various types of data biases, including CIFAR-10/100,
Animal-10N, ImageNet-LT, and Webvision. Code will soon be publicly available
Sample-adaptive Augmentation for Point Cloud Recognition Against Real-world Corruptions
Robust 3D perception under corruption has become an essential task for the
realm of 3D vision. While current data augmentation techniques usually perform
random transformations on all point cloud objects in an offline way and ignore
the structure of the samples, resulting in over-or-under enhancement. In this
work, we propose an alternative to make sample-adaptive transformations based
on the structure of the sample to cope with potential corruption via an
auto-augmentation framework, named as AdaptPoint. Specially, we leverage a
imitator, consisting of a Deformation Controller and a Mask Controller,
respectively in charge of predicting deformation parameters and producing a
per-point mask, based on the intrinsic structural information of the input
point cloud, and then conduct corruption simulations on top. Then a
discriminator is utilized to prevent the generation of excessive corruption
that deviates from the original data distribution. In addition, a
perception-guidance feedback mechanism is incorporated to guide the generation
of samples with appropriate difficulty level. Furthermore, to address the
paucity of real-world corrupted point cloud, we also introduce a new dataset
ScanObjectNN-C, that exhibits greater similarity to actual data in real-world
environments, especially when contrasted with preceding CAD datasets.
Experiments show that our method achieves state-of-the-art results on multiple
corruption benchmarks, including ModelNet-C, our ScanObjectNN-C, and
ShapeNet-C.Comment: Accepted by ICCV2023; code: https://github.com/Roywangj/AdaptPoin
- …