644 research outputs found
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation
LiDAR provides accurate geometric measurements of the 3D world.
Unfortunately, dense LiDARs are very expensive and the point clouds captured by
low-beam LiDAR are often sparse. To address these issues, we present
UltraLiDAR, a data-driven framework for scene-level LiDAR completion, LiDAR
generation, and LiDAR manipulation. The crux of UltraLiDAR is a compact,
discrete representation that encodes the point cloud's geometric structure, is
robust to noise, and is easy to manipulate. We show that by aligning the
representation of a sparse point cloud to that of a dense point cloud, we can
densify the sparse point clouds as if they were captured by a real high-density
LiDAR, drastically reducing the cost. Furthermore, by learning a prior over the
discrete codebook, we can generate diverse, realistic LiDAR point clouds for
self-driving. We evaluate the effectiveness of UltraLiDAR on sparse-to-dense
LiDAR completion and LiDAR generation. Experiments show that densifying
real-world point clouds with our approach can significantly improve the
performance of downstream perception systems. Compared to prior art on LiDAR
generation, our approach generates much more realistic point clouds. According
to A/B test, over 98.5\% of the time human participants prefer our results over
those of previous methods.Comment: CVPR 2023. Project page: https://waabi.ai/ultralidar
FroDO: From Detections to 3D Objects
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner. Key to FroDO is to embed object shapes in a novel learnt shape space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse or dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction
FroDO: From Detections to 3D Objects
Object-oriented maps are important for scene understanding since they jointly
capture geometry and semantics, allow individual instantiation and meaningful
reasoning about objects. We introduce FroDO, a method for accurate 3D
reconstruction of object instances from RGB video that infers object location,
pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object
shapes in a novel learnt space that allows seamless switching between sparse
point cloud and dense DeepSDF decoding. Given an input sequence of localized
RGB frames, FroDO first aggregates 2D detections to instantiate a
category-aware 3D bounding box per object. A shape code is regressed using an
encoder network before optimizing shape and pose further under the learnt shape
priors using sparse and dense shape representations. The optimization uses
multi-view geometric, photometric and silhouette losses. We evaluate on
real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view,
multi-view, and multi-object reconstruction.Comment: To be published in CVPR 2020. The first two authors contributed
equall
Patch-based Progressive 3D Point Set Upsampling
We present a detail-driven deep neural network for point set upsampling. A
high-resolution point set is essential for point-based rendering and surface
reconstruction. Inspired by the recent success of neural image super-resolution
techniques, we progressively train a cascade of patch-based upsampling networks
on different levels of detail end-to-end. We propose a series of architectural
design contributions that lead to a substantial performance boost. The effect
of each technical contribution is demonstrated in an ablation study.
Qualitative and quantitative experiments show that our method significantly
outperforms the state-of-the-art learning-based and optimazation-based
approaches, both in terms of handling low-resolution inputs and revealing
high-fidelity details.Comment: accepted to cvpr2019, code available at https://github.com/yifita/P3
Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer
Most existing point cloud completion methods suffer from the discrete nature
of point clouds and the unstructured prediction of points in local regions,
which makes it difficult to reveal fine local geometric details. To resolve
this issue, we propose SnowflakeNet with snowflake point deconvolution (SPD) to
generate complete point clouds. SPD models the generation of point clouds as
the snowflake-like growth of points, where child points are generated
progressively by splitting their parent points after each SPD. Our insight into
the detailed geometry is to introduce a skip-transformer in the SPD to learn
the point splitting patterns that can best fit the local regions. The
skip-transformer leverages attention mechanism to summarize the splitting
patterns used in the previous SPD layer to produce the splitting in the current
layer. The locally compact and structured point clouds generated by SPD
precisely reveal the structural characteristics of the 3D shape in local
patches, which enables us to predict highly detailed geometries. Moreover,
since SPD is a general operation that is not limited to completion, we explore
its applications in other generative tasks, including point cloud
auto-encoding, generation, single image reconstruction, and upsampling. Our
experimental results outperform state-of-the-art methods under widely used
benchmarks.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI), 2022. This work is a journal extension of our ICCV 2021 paper
arXiv:2108.04444 . The first two authors contributed equall
ShapeFormer: Transformer-based Shape Completion via Sparse Representation
We present ShapeFormer, a transformer-based network that produces a
distribution of object completions, conditioned on incomplete, and possibly
noisy, point clouds. The resultant distribution can then be sampled to generate
likely completions, each exhibiting plausible shape details while being
faithful to the input. To facilitate the use of transformers for 3D, we
introduce a compact 3D representation, vector quantized deep implicit function,
that utilizes spatial sparsity to represent a close approximation of a 3D shape
by a short sequence of discrete variables. Experiments demonstrate that
ShapeFormer outperforms prior art for shape completion from ambiguous partial
inputs in terms of both completion quality and diversity. We also show that our
approach effectively handles a variety of shape types, incomplete patterns, and
real-world scans.Comment: Project page: https://shapeformer.github.io
- …