56 research outputs found
Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds
Constructing high-quality generative models for 3D shapes is a fundamental
task in computer vision with diverse applications in geometry processing,
engineering, and design. Despite the recent progress in deep generative
modelling, synthesis of finely detailed 3D surfaces, such as high-resolution
point clouds, from scratch has not been achieved with existing approaches. In
this work, we propose to employ the latent-space Laplacian pyramid
representation within a hierarchical generative model for 3D point clouds. We
combine the recently proposed latent-space GAN and Laplacian GAN architectures
to form a multi-scale model capable of generating 3D point clouds at increasing
levels of detail. Our evaluation demonstrates that our model outperforms the
existing generative models for 3D point clouds
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
3D GANs and Latent Space: A comprehensive survey
Generative Adversarial Networks (GANs) have emerged as a significant player
in generative modeling by mapping lower-dimensional random noise to
higher-dimensional spaces. These networks have been used to generate
high-resolution images and 3D objects. The efficient modeling of 3D objects and
human faces is crucial in the development process of 3D graphical environments
such as games or simulations. 3D GANs are a new type of generative model used
for 3D reconstruction, point cloud reconstruction, and 3D semantic scene
completion. The choice of distribution for noise is critical as it represents
the latent space. Understanding a GAN's latent space is essential for
fine-tuning the generated samples, as demonstrated by the morphing of
semantically meaningful parts of images. In this work, we explore the latent
space and 3D GANs, examine several GAN variants and training methods to gain
insights into improving 3D GAN training, and suggest potential future
directions for further research
GENERATIVE NETWORKS FOR POINT CLOUD GENERATION IN CULTURAL HERITAGE DOMAIN
none6noIn the Cultural Heritage (CH) domain, the semantic segmentation of 3D point clouds with Deep Learning (DL) techniques allows to recognize historical architectural elements, at a suitable level of detail, and hence expedite the process of modelling historical buildings for the development of BIM models from survey data. However, it is more difficult to collect a balanced dataset of labelled architectural elements for training a network. In fact, the CH objects are unique, and it is challenging for the network to recognize this kind of data. In recent years, Generative Networks have proven to be proper for generating new data. Starting from such premises, in this paper Generative Networks have been used for augmenting a CH dataset. In particular, the performances of three state-of-art Generative Networks such as PointGrow, PointFLow and PointGMM have been compared in terms of Jensen-Shannon Divergence (JSD), the Minimum Matching Distance-Chamfer Distance (MMD-CD) and the Minimum Matching Distance-Earth Mover’s Distance (MMD-EMD). The objects generated have been used for augmenting two classes of ArCH dataset, which are columns and windows. Then a DGCNN-Mod network was trained and tested for the semantic segmentation task, comparing the performance in the case of the ArCH dataset without and with augmentation.openRoberto Pierdicca, Marina Paolanti, Ramona Quattrini, Massimo Martini, Eva Savina Malinverni, Emanuele FrontoniPierdicca, Roberto; Paolanti, Marina; Quattrini, Ramona; Martini, Massimo; Malinverni, Eva Savina; Frontoni, Emanuel
Adversarial Self-Supervised Scene Flow Estimation
This work proposes a metric learning approach for self-supervised scene flow
estimation. Scene flow estimation is the task of estimating 3D flow vectors for
consecutive 3D point clouds. Such flow vectors are fruitful, \eg for
recognizing actions, or avoiding collisions. Training a neural network via
supervised learning for scene flow is impractical, as this requires manual
annotations for each 3D point at each new timestamp for each scene. To that
end, we seek for a self-supervised approach, where a network learns a latent
metric to distinguish between points translated by flow estimations and the
target point cloud. Our adversarial metric learning includes a multi-scale
triplet loss on sequences of two-point clouds as well as a cycle consistency
loss. Furthermore, we outline a benchmark for self-supervised scene flow
estimation: the Scene Flow Sandbox. The benchmark consists of five datasets
designed to study individual aspects of flow estimation in progressive order of
complexity, from a moving object to real-world scenes. Experimental evaluation
on the benchmark shows that our approach obtains state-of-the-art
self-supervised scene flow results, outperforming recent neighbor-based
approaches. We use our proposed benchmark to expose shortcomings and draw
insights on various training setups. We find that our setup captures motion
coherence and preserves local geometries. Dealing with occlusions, on the other
hand, is still an open challenge.Comment: Published at 3DV 202
A Review on Deep Learning Techniques for Video Prediction
The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions.This work has been funded by the Spanish Government PID2019-104818RB-I00 grant for the MoDeaAS project, supported with Feder funds. This work has also been supported by two Spanish national grants for PhD studies, FPU17/00166, and ACIF/2018/197 respectively
RGB to 3D garment reconstruction using UV map representations
Predicting the geometry of a 3D object from just a single image or viewpoint is an intrinsic human feature extremely challenging for machines. For years, in an attempt to solve this problem, different computer vision approaches and techniques have been investigated. One of the domains in which there has been more research has been the 3D reconstruction and modelling of human bodies. However, the greatest advances in this field have concentrated on the recovery of unclothed human bodies, ignoring garments. Garments are highly detailed, dynamic objects made up of particles that interact with each other and with other objects, making the task of reconstruction even more difficult. Therefore, having a lightweight 3D representation capable of modelling fine details is of great importance. This thesis presents a deep learning framework based on Generative Adversarial Networks (GANs) to reconstruct 3D garment models from a single RGB image. It has the peculiarity of using UV maps to represent 3D data, a lightweight representation capable of dealing with high-resolution details and wrinkles. With this model and kind of 3D representation, we achieve state-of-the-art results on CLOTH3D dataset, generating good quality and realistic reconstructions regardless of the garment topology, human pose, occlusions and lightning, and thus demonstrating the suitability of UV maps for 3D domains and tasks
- …