395 research outputs found
PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention
Generating 3D point clouds is challenging yet highly desired. This work
presents a novel autoregressive model, PointGrow, which can generate diverse
and realistic point cloud samples from scratch or conditioned on semantic
contexts. This model operates recurrently, with each point sampled according to
a conditional distribution given its previously-generated points, allowing
inter-point correlations to be well-exploited and 3D shape generative processes
to be better interpreted. Since point cloud object shapes are typically encoded
by long-range dependencies, we augment our model with dedicated self-attention
modules to capture such relations. Extensive evaluations show that PointGrow
achieves satisfying performance on both unconditional and conditional point
cloud generation tasks, with respect to realism and diversity. Several
important applications, such as unsupervised feature learning and shape
arithmetic operations, are also demonstrated
Dense 3D Object Reconstruction from a Single Depth View
In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs
the complete 3D structure of a given object from a single arbitrary depth view
using generative adversarial networks. Unlike existing work which typically
requires multiple views of the same object or class labels to recover the full
3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation
of a depth view of the object as input, and is able to generate the complete 3D
occupancy grid with a high resolution of 256^3 by recovering the
occluded/missing regions. The key idea is to combine the generative
capabilities of autoencoders and the conditional Generative Adversarial
Networks (GAN) framework, to infer accurate and fine-grained 3D structures of
objects in high-dimensional voxel space. Extensive experiments on large
synthetic datasets and real-world Kinect datasets show that the proposed
3D-RecGAN++ significantly outperforms the state of the art in single view 3D
object reconstruction, and is able to reconstruct unseen types of objects.Comment: TPAMI 2018. Code and data are available at:
https://github.com/Yang7879/3D-RecGAN-extended. This article extends from
arXiv:1708.0796
Unleash the Potential of 3D Point Cloud Modeling with A Calibrated Local Geometry-driven Distance Metric
Quantifying the dissimilarity between two unstructured 3D point clouds is a
challenging task, with existing metrics often relying on measuring the distance
between corresponding points that can be either inefficient or ineffective. In
this paper, we propose a novel distance metric called Calibrated Local Geometry
Distance (CLGD), which computes the difference between the underlying 3D
surfaces calibrated and induced by a set of reference points. By associating
each reference point with two given point clouds through computing its
directional distances to them, the difference in directional distances of an
identical reference point characterizes the geometric difference between a
typical local region of the two point clouds. Finally, CLGD is obtained by
averaging the directional distance differences of all reference points. We
evaluate CLGD on various optimization and unsupervised learning-based tasks,
including shape reconstruction, rigid registration, scene flow estimation, and
feature representation. Extensive experiments show that CLGD achieves
significantly higher accuracy under all tasks in a memory and computationally
efficient manner, compared with existing metrics. As a generic metric, CLGD has
the potential to advance 3D point cloud modeling. The source code is publicly
available at https://github.com/rsy6318/CLGD
UCLID-Net: Single View Reconstruction in Object Space
Most state-of-the-art deep geometric learning single-view reconstruction
approaches rely on encoder-decoder architectures that output either shape
parametrizations or implicit representations. However, these representations
rarely preserve the Euclidean structure of the 3D space objects exist in. In
this paper, we show that building a geometry preserving 3-dimensional latent
space helps the network concurrently learn global shape regularities and local
reasoning in the object coordinate space and, as a result, boosts performance.
We demonstrate both on ShapeNet synthetic images, which are often used for
benchmarking purposes, and on real-world images that our approach outperforms
state-of-the-art ones. Furthermore, the single-view pipeline naturally extends
to multi-view reconstruction, which we also show.Comment: Added supplementary materia
GECCO: Geometrically-Conditioned Point Diffusion Models
Diffusion models generating images conditionally on text, such as Dall-E 2
and Stable Diffusion, have recently made a splash far beyond the computer
vision community. Here, we tackle the related problem of generating point
clouds, both unconditionally, and conditionally with images. For the latter, we
introduce a novel geometrically-motivated conditioning scheme based on
projecting sparse image features into the point cloud and attaching them to
each individual point, at every step in the denoising process. This approach
improves geometric consistency and yields greater fidelity than current methods
relying on unstructured, global latent codes. Additionally, we show how to
apply recent continuous-time diffusion schemes. Our method performs on par or
above the state of art on conditional and unconditional experiments on
synthetic data, while being faster, lighter, and delivering tractable
likelihoods. We show it can also scale to diverse indoors scenes
PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention
A point cloud is an agile 3D representation, efficiently modeling an object's
surface geometry. However, these surface-centric properties also pose
challenges on designing tools to recognize and synthesize point clouds. This
work presents a novel autoregressive model, PointGrow, which generates
realistic point cloud samples from scratch or conditioned on given semantic
contexts. Our model operates recurrently, with each point sampled according to
a conditional distribution given its previously-generated points. Since point
cloud object shapes are typically encoded by long-range interpoint
dependencies, we augment our model with dedicated self-attention modules to
capture these relations. Extensive evaluation demonstrates that PointGrow
achieves satisfying performance on both unconditional and conditional point
cloud generation tasks, with respect to fidelity, diversity and semantic
preservation. Further, conditional PointGrow learns a smooth manifold of given
image conditions where 3D shape interpolation and arithmetic calculation can be
performed inside
PU-Flow: a Point Cloud Upsampling Network with Normalizing Flows
Point cloud upsampling aims to generate dense point clouds from given sparse
ones, which is a challenging task due to the irregular and unordered nature of
point sets. To address this issue, we present a novel deep learning-based
model, called PU-Flow, which incorporates normalizing flows and weight
prediction techniques to produce dense points uniformly distributed on the
underlying surface. Specifically, we exploit the invertible characteristics of
normalizing flows to transform points between Euclidean and latent spaces and
formulate the upsampling process as ensemble of neighbouring points in a latent
space, where the ensemble weights are adaptively learned from local geometric
context. Extensive experiments show that our method is competitive and, in most
test cases, it outperforms state-of-the-art methods in terms of reconstruction
quality, proximity-to-surface accuracy, and computation efficiency. The source
code will be publicly available at https://github.com/unknownue/pu-flow
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Scene synthesis is a challenging problem with several industrial
applications. Recently, substantial efforts have been directed to synthesize
the scene using human motions, room layouts, or spatial graphs as the input.
However, few studies have addressed this problem from multiple modalities,
especially combining text prompts. In this paper, we propose a language-driven
scene synthesis task, which is a new task that integrates text prompts, human
motion, and existing objects for scene synthesis. Unlike other single-condition
synthesis tasks, our problem involves multiple conditions and requires a
strategy for processing and encoding them into a unified space. To address the
challenge, we present a multi-conditional diffusion model, which differs from
the implicit unification approach of other diffusion literature by explicitly
predicting the guiding points for the original data distribution. We
demonstrate that our approach is theoretically supportive. The intensive
experiment results illustrate that our method outperforms state-of-the-art
benchmarks and enables natural scene editing applications. The source code and
dataset can be accessed at https://lang-scene-synth.github.io/.Comment: Accepted to NeurIPS 202
- …