6 research outputs found
Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image
We present a deep reinforcement learning method of progressive view
inpainting for 3D point scene completion under volume guidance, achieving
high-quality scene reconstruction from only a single depth image with severe
occlusion. Our approach is end-to-end, consisting of three modules: 3D scene
volume reconstruction, 2D depth map inpainting, and multi-view selection for
completion. Given a single depth image, our method first goes through the 3D
volume branch to obtain a volumetric scene reconstruction as a guide to the
next view inpainting step, which attempts to make up the missing information;
the third step involves projecting the volume under the same view of the input,
concatenating them to complete the current view depth, and integrating all
depth into the point cloud. Since the occluded areas are unavailable, we resort
to a deep Q-Network to glance around and pick the next best view for large hole
completion progressively until a scene is adequately reconstructed while
guaranteeing validity. All steps are learned jointly to achieve robust and
consistent results. We perform qualitative and quantitative evaluations with
extensive experiments on the SUNCG data, obtaining better results than the
state of the art.Comment: Accepted as CVPR 2019 Ora
Next-Best View Policy for 3D Reconstruction
Manually selecting viewpoints or using commonly available flight planners
like circular path for large-scale 3D reconstruction using drones often results
in incomplete 3D models. Recent works have relied on hand-engineered heuristics
such as information gain to select the Next-Best Views. In this work, we
present a learning-based algorithm called Scan-RL to learn a Next-Best View
(NBV) Policy. To train and evaluate the agent, we created Houses3K, a dataset
of 3D house models. Our experiments show that using Scan-RL, the agent can scan
houses with fewer number of steps and a shorter distance compared to our
baseline circular path. Experimental results also demonstrate that a single NBV
policy can be used to scan multiple houses including those that were not seen
during training. The link to Scan-RL is available at
https://github.com/darylperalta/ScanRL and Houses3K dataset can be found at
https://github.com/darylperalta/Houses3K.Comment: To be published in ECCV 2020 Workshops; typos in abstract correcte
Robust Image Matching By Dynamic Feature Selection
Estimating dense correspondences between images is a long-standing image
under-standing task. Recent works introduce convolutional neural networks
(CNNs) to extract high-level feature maps and find correspondences through
feature matching. However,high-level feature maps are in low spatial resolution
and therefore insufficient to provide accurate and fine-grained features to
distinguish intra-class variations for correspondence matching. To address this
problem, we generate robust features by dynamically selecting features at
different scales. To resolve two critical issues in feature selection,i.e.,how
many and which scales of features to be selected, we frame the feature
selection process as a sequential Markov decision-making process (MDP) and
introduce an optimal selection strategy using reinforcement learning (RL). We
define an RL environment for image matching in which each individual action
either requires new features or terminates the selection episode by referring a
matching score. Deep neural networks are incorporated into our method and
trained for decision making. Experimental results show that our method achieves
comparable/superior performance with state-of-the-art methods on three
benchmarks, demonstrating the effectiveness of our feature selection strategy
3D-NVS: A 3D Supervision Approach for Next View Selection
We present a classification based approach for the next best view selection
and show how we can plausibly obtain a supervisory signal for this task. The
proposed approach is end-to-end trainable and aims to get the best possible 3D
reconstruction quality with a pair of passively acquired 2D views. The proposed
model consists of two stages: a classifier and a reconstructor network trained
jointly via the indirect 3D supervision from ground truth voxels. While
testing, the proposed method assumes no prior knowledge of the underlying 3D
shape for selecting the next best view. We demonstrate the proposed method's
effectiveness via detailed experiments on synthetic and real images and show
how it provides improved reconstruction quality than the existing state of the
art 3D reconstruction and the next best view prediction techniques.Comment: Submitted to CVPR-2
KAPLAN: A 3D Point Descriptor for Shape Completion
We present a novel 3D shape completion method that operates directly on
unstructured point clouds, thus avoiding resource-intensive data structures
like voxel grids. To this end, we introduce KAPLAN, a 3D point descriptor that
aggregates local shape information via a series of 2D convolutions. The key
idea is to project the points in a local neighborhood onto multiple planes with
different orientations. In each of those planes, point properties like normals
or point-to-plane distances are aggregated into a 2D grid and abstracted into a
feature representation with an efficient 2D convolutional encoder. Since all
planes are encoded jointly, the resulting representation nevertheless can
capture their correlations and retains knowledge about the underlying 3D shape,
without expensive 3D convolutions. Experiments on public datasets show that
KAPLAN achieves state-of-the-art performance for 3D shape completion.Comment: 18 pages, 15 figure
Synthetic Data for Deep Learning
Synthetic data is an increasingly popular tool for training deep learning
models, especially in computer vision but also in other areas. In this work, we
attempt to provide a comprehensive survey of the various directions in the
development and application of synthetic data. First, we discuss synthetic
datasets for basic computer vision problems, both low-level (e.g., optical flow
estimation) and high-level (e.g., semantic segmentation), synthetic
environments and datasets for outdoor and urban scenes (autonomous driving),
indoor scenes (indoor navigation), aerial navigation, simulation environments
for robotics, applications of synthetic data outside computer vision (in neural
programming, bioinformatics, NLP, and more); we also survey the work on
improving synthetic data development and alternative ways to produce it such as
GANs. Second, we discuss in detail the synthetic-to-real domain adaptation
problem that inevitably arises in applications of synthetic data, including
synthetic-to-real refinement with GAN-based models and domain adaptation at the
feature/model level without explicit data transformations. Third, we turn to
privacy-related applications of synthetic data and review the work on
generating synthetic datasets with differential privacy guarantees. We conclude
by highlighting the most promising directions for further work in synthetic
data studies.Comment: 156 pages, 24 figures, 719 reference