11,232 research outputs found
AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation
This paper targets on learning-based novel view synthesis from a single or
limited 2D images without the pose supervision. In the viewer-centered
coordinates, we construct an end-to-end trainable conditional variational
framework to disentangle the unsupervisely learned relative-pose/rotation and
implicit global 3D representation (shape, texture and the origin of
viewer-centered coordinates, etc.). The global appearance of the 3D object is
given by several appearance-describing images taken from any number of
viewpoints. Our spatial correlation module extracts a global 3D representation
from the appearance-describing images in a permutation invariant manner. Our
system can achieve implicitly 3D understanding without explicitly 3D
reconstruction. With an unsupervisely learned viewer-centered
relative-pose/rotation code, the decoder can hallucinate the novel view
continuously by sampling the relative-pose in a prior distribution. In various
applications, we demonstrate that our model can achieve comparable or even
better results than pose/3D model-supervised learning-based novel view
synthesis (NVS) methods with any number of input views.Comment: ECCV 202
Single View Reconstruction for Human Face and Motion with Priors
Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square.
Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking.
Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
We propose a real-time RGB-based pipeline for object detection and 6D pose
estimation. Our novel 3D orientation estimation is based on a variant of the
Denoising Autoencoder that is trained on simulated views of a 3D model using
Domain Randomization. This so-called Augmented Autoencoder has several
advantages over existing methods: It does not require real, pose-annotated
training data, generalizes to various test sensors and inherently handles
object and view symmetries. Instead of learning an explicit mapping from input
images to object poses, it provides an implicit representation of object
orientations defined by samples in a latent space. Our pipeline achieves
state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D
domain. We also evaluate on the LineMOD dataset where we can compete with other
synthetically trained approaches. We further increase performance by correcting
3D orientation estimates to account for perspective errors when the object
deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
In this paper, we introduce a novel unsupervised domain adaptation technique
for the task of 3D keypoint prediction from a single depth scan or image. Our
key idea is to utilize the fact that predictions from different views of the
same or similar objects should be consistent with each other. Such view
consistency can provide effective regularization for keypoint prediction on
unlabeled instances. In addition, we introduce a geometric alignment term to
regularize predictions in the target domain. The resulting loss function can be
effectively optimized via alternating minimization. We demonstrate the
effectiveness of our approach on real datasets and present experimental results
showing that our approach is superior to state-of-the-art general-purpose
domain adaptation techniques.Comment: ECCV 201
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
- …