370 research outputs found
Joint Prediction of Depths, Normals and Surface Curvature from RGB Images using CNNs
Understanding the 3D structure of a scene is of vital importance, when it
comes to developing fully autonomous robots. To this end, we present a novel
deep learning based framework that estimates depth, surface normals and surface
curvature by only using a single RGB image. To the best of our knowledge this
is the first work to estimate surface curvature from colour using a machine
learning approach. Additionally, we demonstrate that by tuning the network to
infer well designed features, such as surface curvature, we can achieve
improved performance at estimating depth and normals.This indicates that
network guidance is still a useful aspect of designing and training a neural
network. We run extensive experiments where the network is trained to infer
different tasks while the model capacity is kept constant resulting in
different feature maps based on the tasks at hand. We outperform the previous
state-of-the-art benchmarks which jointly estimate depths and surface normals
while predicting surface curvature in parallel
QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds
This paper presents a novel framework to learn a concise geometric primitive
representation for 3D point clouds. Different from representing each type of
primitive individually, we focus on the challenging problem of how to achieve a
concise and uniform representation robustly. We employ quadrics to represent
diverse primitives with only 10 parameters and propose the first end-to-end
learning-based framework, namely QuadricsNet, to parse quadrics in point
clouds. The relationships between quadrics mathematical formulation and
geometric attributes, including the type, scale and pose, are insightfully
integrated for effective supervision of QuaidricsNet. Besides, a novel
pattern-comprehensive dataset with quadrics segments and objects is collected
for training and evaluation. Experiments demonstrate the effectiveness of our
concise representation and the robustness of QuadricsNet. Our code is available
at \url{https://github.com/MichaelWu99-lab/QuadricsNet}Comment: Submitted to ICRA 2024. 7 page
PvDeConv: Point-Voxel Deconvolution for Autoencoding CAD Construction in 3D
We propose a Point-Voxel DeConvolution (PVDeConv) module for 3D data
autoencoder. To demonstrate its efficiency we learn to synthesize
high-resolution point clouds of 10k points that densely describe the underlying
geometry of Computer Aided Design (CAD) models. Scanning artifacts, such as
protrusions, missing parts, smoothed edges and holes, inevitably appear in real
3D scans of fabricated CAD objects. Learning the original CAD model
construction from a 3D scan requires a ground truth to be available together
with the corresponding 3D scan of an object. To solve the gap, we introduce a
new dedicated dataset, the CC3D, containing 50k+ pairs of CAD models and their
corresponding 3D meshes. This dataset is used to learn a convolutional
autoencoder for point clouds sampled from the pairs of 3D scans - CAD models.
The challenges of this new dataset are demonstrated in comparison with other
generative point cloud sampling models trained on ShapeNet. The CC3D
autoencoder is efficient with respect to memory consumption and training time
as compared to stateof-the-art models for 3D data generation.Comment: 2020 IEEE International Conference on Image Processing (ICIP
3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces
Learning a disentangled, interpretable, and structured latent representation
in 3D generative models of faces and bodies is still an open problem. The
problem is particularly acute when control over identity features is required.
In this paper, we propose an intuitive yet effective self-supervised approach
to train a 3D shape variational autoencoder (VAE) which encourages a
disentangled latent representation of identity features. Curating the
mini-batch generation by swapping arbitrary features across different shapes
allows to define a loss function leveraging known differences and similarities
in the latent representations. Experimental results conducted on 3D meshes show
that state-of-the-art methods for latent disentanglement are not able to
disentangle identity features of faces and bodies. Our proposed method properly
decouples the generation of such features while maintaining good representation
and reconstruction capabilities
FroDO: From Detections to 3D Objects
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner. Key to FroDO is to embed object shapes in a novel learnt shape space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse or dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction
- …