9 research outputs found
IC3D: Image-Conditioned 3D Diffusion for Shape Generation
In the last years, Denoising Diffusion Probabilistic Models (DDPMs) obtained
state-of-the-art results in many generative tasks, outperforming GANs and other
classes of generative models. In particular, they reached impressive results in
various image generation sub-tasks, among which conditional generation tasks
such as text-guided image synthesis. Given the success of DDPMs in 2D
generation, they have more recently been applied to 3D shape generation,
outperforming previous approaches and reaching state-of-the-art results.
However, 3D data pose additional challenges, such as the choice of the 3D
representation, which impacts design choices and model efficiency. While
reaching state-of-the-art results in generation quality, existing 3D DDPM works
make little or no use of guidance, mainly being unconditional or
class-conditional. In this paper, we present IC3D, the first Image-Conditioned
3D Diffusion model that generates 3D shapes by image guidance. It is also the
first 3D DDPM model that adopts voxels as a 3D representation. To guide our
DDPM, we present and leverage CISP (Contrastive Image-Shape Pre-training), a
model jointly embedding images and shapes by contrastive pre-training, inspired
by text-to-image DDPM works. Our generative diffusion model outperforms the
state-of-the-art in 3D generation quality and diversity. Furthermore, we show
that our generated shapes are preferred by human evaluators to a SoTA
single-view 3D reconstruction model in terms of quality and coherence to the
query image by running a side-by-side human evaluation
Continual Cross-Dataset Adaptation in Road Surface Classification
Accurate road surface classification is crucial for autonomous vehicles (AVs)
to optimize driving conditions, enhance safety, and enable advanced road
mapping. However, deep learning models for road surface classification suffer
from poor generalization when tested on unseen datasets. To update these models
with new information, also the original training dataset must be taken into
account, in order to avoid catastrophic forgetting. This is, however,
inefficient if not impossible, e.g., when the data is collected in streams or
large amounts. To overcome this limitation and enable fast and efficient
cross-dataset adaptation, we propose to employ continual learning finetuning
methods designed to retain past knowledge while adapting to new data, thus
effectively avoiding forgetting. Experimental results demonstrate the
superiority of this approach over naive finetuning, achieving performance close
to fresh retraining. While solving this known problem, we also provide a
general description of how the same technique can be adopted in other AV
scenarios. We highlight the potential computational and economic benefits that
a continual-based adaptation can bring to the AV industry, while also reducing
greenhouse emissions due to unnecessary joint retraining.Comment: To be published in Proceedings of 26th IEEE International Conference
on Intelligent Transportation Systems (ITSC 2023
RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline
Loop Closure Detection (LCD) is an essential task in robotics and computer
vision, serving as a fundamental component for various applications across
diverse domains. These applications encompass object recognition, image
retrieval, and video analysis. LCD consists in identifying whether a robot has
returned to a previously visited location, referred to as a loop, and then
estimating the related roto-translation with respect to the analyzed location.
Despite the numerous advantages of radar sensors, such as their ability to
operate under diverse weather conditions and provide a wider range of view
compared to other commonly used sensors (e.g., cameras or LiDARs), integrating
radar data remains an arduous task due to intrinsic noise and distortion. To
address this challenge, this research introduces RadarLCD, a novel supervised
deep learning pipeline specifically designed for Loop Closure Detection using
the FMCW Radar (Frequency Modulated Continuous Wave) sensor. RadarLCD, a
learning-based LCD methodology explicitly designed for radar systems, makes a
significant contribution by leveraging the pre-trained HERO (Hybrid Estimation
Radar Odometry) model. Being originally developed for radar odometry, HERO's
features are used to select key points crucial for LCD tasks. The methodology
undergoes evaluation across a variety of FMCW Radar dataset scenes, and it is
compared to state-of-the-art systems such as Scan Context for Place Recognition
and ICP for Loop Closure. The results demonstrate that RadarLCD surpasses the
alternatives in multiple aspects of Loop Closure Detection.Comment: 7 pages, 2 figure
Advances in centerline estimation for autonomous lateral control
The ability of autonomous vehicles to maintain an accurate trajectory within
their road lane is crucial for safe operation. This requires detecting the road
lines and estimating the car relative pose within its lane. Lateral lines are
usually retrieved from camera images. Still, most of the works on line
detection are limited to image mask retrieval and do not provide a usable
representation in world coordinates. What we propose in this paper is a
complete perception pipeline based on monocular vision and able to retrieve all
the information required by a vehicle lateral control system: road lines
equation, centerline, vehicle heading and lateral displacement. We evaluate our
system by acquiring data with accurate geometric ground truth. To act as a
benchmark for further research, we make this new dataset publicly available at
http://airlab.deib.polimi.it/datasets/.Comment: Presented at 2020 IEEE Intelligent Vehicles Symposium (IV), 8 pages,
8 figure