4,729 research outputs found
DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
While the community of 3D point cloud generation has witnessed a big growth
in recent years, there still lacks an effective way to enable intuitive user
control in the generation process, hence limiting the general utility of such
methods. Since an intuitive way of decomposing a shape is through its parts, we
propose to tackle the task of controllable part-based point cloud generation.
We introduce DiffFacto, a novel probabilistic generative model that learns the
distribution of shapes with part-level control. We propose a factorization that
models independent part style and part configuration distributions and presents
a novel cross-diffusion network that enables us to generate coherent and
plausible shapes under our proposed factorization. Experiments show that our
method is able to generate novel shapes with multiple axes of control. It
achieves state-of-the-art part-level generation quality and generates plausible
and coherent shapes while enabling various downstream editing applications such
as shape interpolation, mixing, and transformation editing. Project website:
https://difffacto.github.io
Combining Implicit Function Learning and Parametric Models for {3D} Human Reconstruction
Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces. However, they can only produce static surfaces that are not controllable, which provides limited ability to modify the resulting model by editing its pose or shape parameters. Nevertheless, such features are essential in building flexible models for both computer graphics and computer vision. In this work, we present methodology that combines detail-rich implicit functions and parametric representations in order to reconstruct 3D models of people that remain controllable and accurate even in the presence of clothing. Given sparse 3D point clouds sampled on the surface of a dressed person, we use an Implicit Part Network (IP-Net)to jointly predict the outer 3D surface of the dressed person, the and inner body surface, and the semantic correspondences to a parametric body model. We subsequently use correspondences to fit the body model to our inner surface and then non-rigidly deform it (under a parametric body + displacement model) to the outer surface in order to capture garment, face and hair detail. In quantitative and qualitative experiments with both full body data and hand scans we show that the proposed methodology generalizes, and is effective even given incomplete point clouds collected from single-view depth images. Our models and code can be downloaded from http://virtualhumans.mpi-inf.mpg.de/ipnet
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation
LiDAR provides accurate geometric measurements of the 3D world.
Unfortunately, dense LiDARs are very expensive and the point clouds captured by
low-beam LiDAR are often sparse. To address these issues, we present
UltraLiDAR, a data-driven framework for scene-level LiDAR completion, LiDAR
generation, and LiDAR manipulation. The crux of UltraLiDAR is a compact,
discrete representation that encodes the point cloud's geometric structure, is
robust to noise, and is easy to manipulate. We show that by aligning the
representation of a sparse point cloud to that of a dense point cloud, we can
densify the sparse point clouds as if they were captured by a real high-density
LiDAR, drastically reducing the cost. Furthermore, by learning a prior over the
discrete codebook, we can generate diverse, realistic LiDAR point clouds for
self-driving. We evaluate the effectiveness of UltraLiDAR on sparse-to-dense
LiDAR completion and LiDAR generation. Experiments show that densifying
real-world point clouds with our approach can significantly improve the
performance of downstream perception systems. Compared to prior art on LiDAR
generation, our approach generates much more realistic point clouds. According
to A/B test, over 98.5\% of the time human participants prefer our results over
those of previous methods.Comment: CVPR 2023. Project page: https://waabi.ai/ultralidar
Abrasion of flat rotating shapes
We report on the erosion of flat linoleum "pebbles" under steady rotation in
a slurry of abrasive grit. To quantify shape as a function of time, we develop
a general method in which the pebble is photographed from multiple angles with
respect to the grid of pixels in a digital camera. This reduces digitization
noise, and allows the local curvature of the contour to be computed with a
controllable degree of uncertainty. Several shape descriptors are then employed
to follow the evolution of different initial shapes toward a circle, where
abrasion halts. The results are in good quantitative agreement with a simple
model, where we propose that points along the contour move radially inward in
proportion to the product of the radius and the derivative of radius with
respect to angle
DSM-Net: Disentangled Structured Mesh Net for Controllable Generation of Fine Geometry
3D shape generation is a fundamental operation in computer graphics. While
significant progress has been made, especially with recent deep generative
models, it remains a challenge to synthesize high-quality geometric shapes with
rich detail and complex structure, in a controllable manner. To tackle this, we
introduce DSM-Net, a deep neural network that learns a disentangled structured
mesh representation for 3D shapes, where two key aspects of shapes, geometry
and structure, are encoded in a synergistic manner to ensure plausibility of
the generated shapes, while also being disentangled as much as possible. This
supports a range of novel shape generation applications with intuitive control,
such as interpolation of structure (geometry) while keeping geometry
(structure) unchanged. To achieve this, we simultaneously learn structure and
geometry through variational autoencoders (VAEs) in a hierarchical manner for
both, with bijective mappings at each level. In this manner we effectively
encode geometry and structure in separate latent spaces, while ensuring their
compatibility: the structure is used to guide the geometry and vice versa. At
the leaf level, the part geometry is represented using a conditional part VAE,
to encode high-quality geometric details, guided by the structure context as
the condition. Our method not only supports controllable generation
applications, but also produces high-quality synthesized shapes, outperforming
state-of-the-art methods
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions
Dense 3D Object Reconstruction from a Single Depth View
In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs
the complete 3D structure of a given object from a single arbitrary depth view
using generative adversarial networks. Unlike existing work which typically
requires multiple views of the same object or class labels to recover the full
3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation
of a depth view of the object as input, and is able to generate the complete 3D
occupancy grid with a high resolution of 256^3 by recovering the
occluded/missing regions. The key idea is to combine the generative
capabilities of autoencoders and the conditional Generative Adversarial
Networks (GAN) framework, to infer accurate and fine-grained 3D structures of
objects in high-dimensional voxel space. Extensive experiments on large
synthetic datasets and real-world Kinect datasets show that the proposed
3D-RecGAN++ significantly outperforms the state of the art in single view 3D
object reconstruction, and is able to reconstruct unseen types of objects.Comment: TPAMI 2018. Code and data are available at:
https://github.com/Yang7879/3D-RecGAN-extended. This article extends from
arXiv:1708.0796
- …