6 research outputs found
Finding Your (3D) Center: 3D Object Detection Using a Learned Loss
Massive semantically labeled datasets are readily available for 2D images,
however, are much harder to achieve for 3D scenes. Objects in 3D repositories
like ShapeNet are labeled, but regrettably only in isolation, so without
context. 3D scenes can be acquired by range scanners on city-level scale, but
much fewer with semantic labels. Addressing this disparity, we introduce a new
optimization procedure, which allows training for 3D detection with raw 3D
scans while using as little as 5% of the object labels and still achieve
comparable performance. Our optimization uses two networks. A scene network
maps an entire 3D scene to a set of 3D object centers. As we assume the scene
not to be labeled by centers, no classic loss, such as Chamfer can be used to
train it. Instead, we use another network to emulate the loss. This loss
network is trained on a small labeled subset and maps a non centered 3D object
in the presence of distractions to its own center. This function is very
similar - and hence can be used instead of - the gradient the supervised loss
would provide. Our evaluation documents competitive fidelity at a much lower
level of supervision, respectively higher quality at comparable supervision.
Supplementary material can be found at: https://dgriffiths3.github.io.Comment: 19 pages, 8 figures, Accepted ECCV 202
Dual Discriminator Adversarial Distillation for Data-free Model Compression
Knowledge distillation has been widely used to produce portable and efficient
neural networks which can be well applied on edge devices for computer vision
tasks. However, almost all top-performing knowledge distillation methods need
to access the original training data, which usually has a huge size and is
often unavailable. To tackle this problem, we propose a novel data-free
approach in this paper, named Dual Discriminator Adversarial Distillation
(DDAD) to distill a neural network without any training data or meta-data. To
be specific, we use a generator to create samples through dual discriminator
adversarial distillation, which mimics the original training data. The
generator not only uses the pre-trained teacher's intrinsic statistics in
existing batch normalization layers but also obtains the maximum discrepancy
from the student model. Then the generated samples are used to train the
compact student network under the supervision of the teacher. The proposed
method obtains an efficient student network which closely approximates its
teacher network, despite using no original training data. Extensive experiments
are conducted to to demonstrate the effectiveness of the proposed approach on
CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover,
we extend our method to semantic segmentation tasks on several public datasets
such as CamVid and NYUv2. All experiments show that our method outperforms all
baselines for data-free knowledge distillation
3D Object Detection for Autonomous Driving: A Survey
Autonomous driving is regarded as one of the most promising remedies to
shield human beings from severe crashes. To this end, 3D object detection
serves as the core basis of such perception system especially for the sake of
path planning, motion prediction, collision avoidance, etc. Generally, stereo
or monocular images with corresponding 3D point clouds are already standard
layout for 3D object detection, out of which point clouds are increasingly
prevalent with accurate depth information being provided. Despite existing
efforts, 3D object detection on point clouds is still in its infancy due to
high sparseness and irregularity of point clouds by nature, misalignment view
between camera view and LiDAR bird's eye of view for modality synergies,
occlusions and scale variations at long distances, etc. Recently, profound
progress has been made in 3D object detection, with a large body of literature
being investigated to address this vision task. As such, we present a
comprehensive review of the latest progress in this field covering all the main
topics including sensors, fundamentals, and the recent state-of-the-art
detection methods with their pros and cons. Furthermore, we introduce metrics
and provide quantitative comparisons on popular public datasets. The avenues
for future work are going to be judiciously identified after an in-deep
analysis of the surveyed works. Finally, we conclude this paper.Comment: 3D object detection, Autonomous driving, Point cloud
Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes
This report surveys advances in deep learning-based modeling techniques that
address four different 3D indoor scene analysis tasks, as well as synthesis of
3D indoor scenes. We describe different kinds of representations for indoor
scenes, various indoor scene datasets available for research in the
aforementioned areas, and discuss notable works employing machine learning
models for such scene modeling tasks based on these representations.
Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With
respect to analysis, we focus on four basic scene understanding tasks -- 3D
object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene
similarity. And for synthesis, we mainly discuss neural scene synthesis works,
though also highlighting model-driven methods that allow for human-centric,
progressive scene synthesis. We identify the challenges involved in modeling
scenes for these tasks and the kind of machinery that needs to be developed to
adapt to the data representation, and the task setting in general. For each of
these tasks, we provide a comprehensive summary of the state-of-the-art works
across different axes such as the choice of data representation, backbone,
evaluation metric, input, output, etc., providing an organized review of the
literature. Towards the end, we discuss some interesting research directions
that have the potential to make a direct impact on the way users interact and
engage with these virtual scene models, making them an integral part of the
metaverse.Comment: Published in Computer Graphics Forum, Aug 202