235 research outputs found
Weakly supervised 3D Reconstruction with Adversarial Constraint
Supervised 3D reconstruction has witnessed a significant progress through the
use of deep neural networks. However, this increase in performance requires
large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D
supervision as an alternative for expensive 3D CAD annotation. Specifically, we
use foreground masks as weak supervision through a raytrace pooling layer that
enables perspective projection and backpropagation. Additionally, since the 3D
reconstruction from masks is an ill posed problem, we propose to constrain the
3D reconstruction to the manifold of unlabeled realistic 3D shapes that match
mask observations. We demonstrate that learning a log-barrier solution to this
constrained optimization problem resembles the GAN objective, enabling the use
of existing tools for training GANs. We evaluate and analyze the manifold
constrained reconstruction on various datasets for single and multi-view
reconstruction of both synthetic and real images
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple
applications ranging from robotic manipulation to augmented reality. Prior
methods have tackled this problem through generative models which predict 3D
reconstructions as voxels or point clouds. However, these methods can be
computationally expensive and miss fine details. We introduce a new
differentiable layer for 3D data deformation and use it in DeformNet to learn a
model for 3D reconstruction-through-deformation. DeformNet takes an image
input, searches the nearest shape template from a database, and deforms the
template to match the query image. We evaluate our approach on the ShapeNet
dataset and show that - (a) the Free-Form Deformation layer is a powerful new
building block for Deep Learning models that manipulate 3D data (b) DeformNet
uses this FFD layer combined with shape retrieval for smooth and
detail-preserving 3D reconstruction of qualitatively plausible point clouds
with respect to a single query image (c) compared to other state-of-the-art 3D
reconstruction methods, DeformNet quantitatively matches or outperforms their
benchmarks by significant margins. For more information, visit:
https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP
Recommended from our members
3D Shape Understanding and Generation
In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually all image-based benchmarks. For 3D data, the best combination of model and representation is still an open question. Second, 3D data is not available on the same scale as images â taking pictures is a common procedure in our daily lives, whereas capturing 3D content is an activity usually restricted to specialized professionals. This thesis is focused on addressing both of these issues. Which model and representation should we use for generating and recognizing 3D data? What are efficient ways of learning 3D representations from a few examples? Is it possible to leverage image data to build models capable of reasoning about the world in 3D?
Our research findings show that it is possible to build models that efficiently generate 3D shapes as irregularly structured representations. Those models require significantly less memory while generating higher quality shapes than the ones based on voxels and multi-view representations. We start by developing techniques to generate shapes represented as point clouds. This class of models leads to high quality reconstructions and better unsupervised feature learning. However, since point clouds are not amenable to editing and human manipulation, we also present models capable of generating shapes as sets of shape handles -- simpler primitives that summarize complex 3D shapes and were specifically designed for high-level tasks and user interaction. Despite their effectiveness, those approaches require some form of 3D supervision, which is scarce. We present multiple alternatives to this problem. First, we investigate how approximate convex decomposition techniques can be used as self-supervision to improve recognition models when only a limited number of labels are available. Second, we study how neural network architectures induce shape priors that can be used in multiple reconstruction tasks -- using both volumetric and manifold representations. In this regime, reconstruction is performed from a single example -- either a sparse point cloud or multiple silhouettes. Finally, we demonstrate how to train generative models of 3D shapes without using any 3D supervision by combining differentiable rendering techniques and Generative Adversarial Networks
Recommended from our members
Alternative hull detection techniques for preprocessing in proton computed tomography reconstruction
The purpose of this study was to develop computationally efficient hull detection techniques appropriate for image reconstruction using sparse matrices. The hull detection techniques investigated were space carving (SC), modified space carving (MSC), and space modeling (SM) and these were compared to the cone-beam version of filtered back projection (FBP) algorithm in terms of their computation time and the quality of the object hull they produced
Accelerated volumetric reconstruction from uncalibrated camera views
While both work with images, computer graphics and computer vision are inverse problems. Computer graphics starts traditionally with input geometric models and produces image sequences. Computer vision starts with input image sequences and produces geometric models. In the last few years, there has been a convergence of research to bridge the gap between the two fields.
This convergence has produced a new field called Image-based Rendering and Modeling (IBMR). IBMR represents the effort of using the geometric information recovered from real images to generate new images with the hope that the synthesized
ones appear photorealistic, as well as reducing the time spent on model creation.
In this dissertation, the capturing, geometric and photometric aspects of an IBMR system are studied. A versatile framework was developed that enables the reconstruction of scenes from images acquired with a handheld digital camera. The proposed system targets applications in areas such as Computer Gaming and Virtual Reality, from a lowcost perspective. In the spirit of IBMR, the human operator is allowed to provide the high-level information, while underlying algorithms are used to perform low-level computational work. Conforming to the latest architecture trends, we propose a streaming voxel carving method, allowing a fast GPU-based processing on commodity hardware
3D Scene Reconstruction with Micro-Aerial Vehicles and Mobile Devices
Scene reconstruction is the process of building an accurate geometric model of one\u27s environment from sensor data. We explore the problem of real-time, large-scale 3D scene reconstruction in indoor environments using small laser range-finders and low-cost RGB-D (color plus depth) cameras. We focus on computationally-constrained platforms such as micro-aerial vehicles (MAVs) and mobile devices. These platforms present a set of fundamental challenges - estimating the state and trajectory of the device as it moves within its environment and utilizing lightweight, dynamic data structures to hold the representation of the reconstructed scene. The system needs to be computationally and memory-efficient, so that it can run in real time, onboard the platform.
In this work, we present three scene reconstruction systems. The first system uses a laser range-finder and operates onboard a quadrotor MAV. We address the issues of autonomous control, state estimation, path-planning, and teleoperation. We propose the multi-volume occupancy grid (MVOG) - a novel data structure for building 3D maps from laser data, which provides a compact, probabilistic scene representation.
The second system uses an RGB-D camera to recover the 6-DoF trajectory of the platform by aligning sparse features observed in the current RGB-D image against a model of previously seen features. We discuss our work on camera calibration and the depth measurement model. We apply the system onboard an MAV to produce occupancy-based 3D maps, which we utilize for path-planning.
Finally, we present our contributions to a scene reconstruction system for mobile devices with built-in depth sensing and motion-tracking capabilities. We demonstrate reconstructing and rendering a global mesh on the fly, using only the mobile device\u27s CPU, in very large (300 square meter) scenes, at a resolutions of 2-3cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing
Analysis of 3D human gait reconstructed with a depth camera and mirrors
L'évaluation de la démarche humaine est l'une des composantes essentielles dans les soins de santé. Les systÚmes à base de marqueurs avec plusieurs caméras sont largement utilisés pour faire cette analyse. Cependant, ces systÚmes nécessitent généralement des équipements spécifiques à prix élevé et/ou des moyens de calcul intensif. Afin de réduire le coût de ces dispositifs, nous nous concentrons sur un systÚme d'analyse de la marche qui utilise une seule caméra de profondeur. Le principe de notre travail est similaire aux systÚmes multi-caméras, mais l'ensemble de caméras est remplacé par un seul capteur de profondeur et des miroirs. Chaque miroir dans notre configuration joue le rÎle d'une caméra qui capture la scÚne sous un point de vue différent. Puisque nous n'utilisons qu'une seule caméra, il est ainsi possible d'éviter l'étape de synchronisation et également de réduire le coût de l'appareillage.
Notre thĂšse peut ĂȘtre divisĂ©e en deux sections: reconstruction 3D et analyse de la marche. Le rĂ©sultat de la premiĂšre section est utilisĂ© comme entrĂ©e de la seconde. Notre systĂšme pour la reconstruction 3D est constituĂ© d'une camĂ©ra de profondeur et deux miroirs. Deux types de capteurs de profondeur, qui se distinguent sur la base du mĂ©canisme d'estimation de profondeur, ont Ă©tĂ© utilisĂ©s dans nos travaux. Avec la technique de lumiĂšre structurĂ©e (SL) intĂ©grĂ©e dans le capteur Kinect 1, nous effectuons la reconstruction 3D Ă partir des principes de l'optique gĂ©omĂ©trique. Pour augmenter le niveau des dĂ©tails du modĂšle reconstruit en 3D, la Kinect 2 qui estime la profondeur par temps de vol (ToF), est ensuite utilisĂ©e pour l'acquisition d'images. Cependant, en raison de rĂ©flections multiples sur les miroirs, il se produit une distorsion de la profondeur dans notre systĂšme. Nous proposons donc une approche simple pour rĂ©duire cette distorsion avant d'appliquer les techniques d'optique gĂ©omĂ©trique pour reconstruire un nuage de points de l'objet 3D.
Pour l'analyse de la dĂ©marche, nous proposons diverses alternatives centrĂ©es sur la normalitĂ© de la marche et la mesure de sa symĂ©trie. Cela devrait ĂȘtre utile lors de traitements cliniques pour Ă©valuer, par exemple, la rĂ©cupĂ©ration du patient aprĂšs une intervention chirurgicale. Ces mĂ©thodes se composent d'approches avec ou sans modĂšle qui ont des inconvĂ©nients et avantages diffĂ©rents. Dans cette thĂšse, nous prĂ©sentons 3 mĂ©thodes qui traitent directement les nuages de points reconstruits dans la section prĂ©cĂ©dente. La premiĂšre utilise la corrĂ©lation croisĂ©e des demi-corps gauche et droit pour Ă©valuer la symĂ©trie de la dĂ©marche, tandis que les deux autres methodes utilisent des autoencodeurs issus de l'apprentissage profond pour mesurer la normalitĂ© de la dĂ©marche.The problem of assessing human gaits has received a great attention in the literature since gait analysis is one of key components in healthcare. Marker-based and multi-camera systems are widely employed to deal with this problem. However, such systems usually require specific equipments with high price and/or high computational cost. In order to reduce the cost of devices, we focus on a system of gait analysis which employs only one depth sensor. The principle of our work is similar to multi-camera systems, but the collection of cameras is replaced by one depth sensor and mirrors. Each mirror in our setup plays the role of a camera which captures the scene at a different viewpoint. Since we use only one camera, the step of synchronization can thus be avoided and the cost of devices is also reduced.
Our studies can be separated into two categories: 3D reconstruction and gait analysis. The result of the former category is used as the input of the latter one. Our system for 3D reconstruction is built with a depth camera and two mirrors. Two types of depth sensor, which are distinguished based on the scheme of depth estimation, have been employed in our works. With the structured light (SL) technique integrated into the Kinect 1, we perform the 3D reconstruction based on geometrical optics. In order to increase the level of details of the 3D reconstructed model, the Kinect 2 with time-of-flight (ToF) depth measurement is used for image acquisition instead of the previous generation. However, due to multiple reflections on the mirrors, depth distortion occurs in our setup. We thus propose a simple approach for reducing such distortion before applying geometrical optics to reconstruct a point cloud of the 3D object.
For the task of gait analysis, we propose various alternative approaches focusing on the problem of gait normality/symmetry measurement. They are expected to be useful for clinical treatments such as monitoring patient's recovery after surgery. These methods consist of model-free and model-based approaches that have different cons and pros. In this dissertation, we present 3 methods that directly process point clouds reconstructed from the previous work. The first one uses cross-correlation of left and right half-bodies to assess gait symmetry while the other ones employ deep auto-encoders to measure gait normality
- âŠ