112 research outputs found
In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations
Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new learned projection model from predicted 3D pose. Our algorithm can be jointly trained on image data with 3D labels and image data with only 2D labels. It achieves state-of-the-art accuracy on challenging in-the-wild data
In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations
Convolutional Neural Network based approaches for monocular 3D human pose
estimation usually require a large amount of training images with 3D pose
annotations. While it is feasible to provide 2D joint annotations for large
corpora of in-the-wild images with humans, providing accurate 3D annotations to
such in-the-wild corpora is hardly feasible in practice. Most existing 3D
labelled data sets are either synthetically created or feature in-studio
images. 3D pose estimation algorithms trained on such data often have limited
ability to generalize to real world scene diversity. We therefore propose a new
deep learning based method for monocular 3D human pose estimation that shows
high accuracy and generalizes better to in-the-wild scenes. It has a network
architecture that comprises a new disentangled hidden space encoding of
explicit 2D and 3D features, and uses supervision by a new learned projection
model from predicted 3D pose. Our algorithm can be jointly trained on image
data with 3D labels and image data with only 2D labels. It achieves
state-of-the-art accuracy on challenging in-the-wild data.Comment: Accepted to CVPR 201
Learned Semantic Multi-Sensor Depth Map Fusion
Volumetric depth map fusion based on truncated signed distance functions has
become a standard method and is used in many 3D reconstruction pipelines. In
this paper, we are generalizing this classic method in multiple ways: 1)
Semantics: Semantic information enriches the scene representation and is
incorporated into the fusion process. 2) Multi-Sensor: Depth information can
originate from different sensors or algorithms with very different noise and
outlier statistics which are considered during data fusion. 3) Scene denoising
and completion: Sensors can fail to recover depth for certain materials and
light conditions, or data is missing due to occlusions. Our method denoises the
geometry, closes holes and computes a watertight surface for every semantic
class. 4) Learning: We propose a neural network reconstruction method that
unifies all these properties within a single powerful framework. Our method
learns sensor or algorithm properties jointly with semantic depth fusion and
scene completion and can also be used as an expert system, e.g. to unify the
strengths of various photometric stereo algorithms. Our approach is the first
to unify all these properties. Experimental evaluations on both synthetic and
real data sets demonstrate clear improvements.Comment: 11 pages, 7 figures, 2 tables, accepted for the 2nd Workshop on 3D
Reconstruction in the Wild (3DRW2019) in conjunction with ICCV201
Multi-label learning based semi-global matching forest
Semi-Global Matching (SGM) approximates a 2D Markov Random Field (MRF) via multiple 1D scanline optimizations, which serves as a good trade-off between accuracy and efficiency in dense matching. Nevertheless, the performance is limited due to the simple summation of the aggregated costs from all 1D scanline optimizations for the final disparity estimation. SGM-Forest improves the performance of SGM by training a random forest to predict the best scanline according to each scanline’s disparity proposal. The disparity estimated by the best scanline acts as reference to adaptively adopt close proposals for further post-processing. However, in many cases more than one scanline is capable of providing a good prediction. Training the random forest with only one scanline labeled may limit or even confuse the learning procedure when other scanlines can offer similar contributions. In this paper, we propose a multi-label classification strategy to further improve SGM-Forest. Each training sample is allowed to be described by multiple labels (or zero label) if more than one (or none) scanline gives a proper prediction. We test the proposed method on stereo matching datasets, from Middlebury, ETH3D, EuroSDR image matching benchmark, and the 2019 IEEE GRSS data fusion contest. The result indicates that under the framework of SGM-Forest, the multi-label strategy outperforms the single-label scheme consistently
When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)
Registration is the process that computes the transformation that aligns sets
of data. Commonly, a registration process can be divided into four main steps:
target selection, feature extraction, feature matching, and transform
computation for the alignment. The accuracy of the result depends on multiple
factors, the most significant are the quantity of input data, the presence of
noise, outliers and occlusions, the quality of the extracted features,
real-time requirements and the type of transformation, especially those ones
defined by multiple parameters, like non-rigid deformations.
Recent advancements in machine learning could be a turning point in these
issues, particularly with the development of deep learning (DL) techniques,
which are helping to improve multiple computer vision problems through an
abstract understanding of the input data. In this paper, a review of deep
learning-based registration methods is presented. We classify the different
papers proposing a framework extracted from the traditional registration
pipeline to analyse the new learning-based proposal strengths. Deep
Registration Networks (DRNs) try to solve the alignment task either replacing
part of the traditional pipeline with a network or fully solving the
registration problem. The main conclusions extracted are, on the one hand, 1)
learning-based registration techniques cannot always be clearly classified in
the traditional pipeline. 2) These approaches allow more complex inputs like
conceptual models as well as the traditional 3D datasets. 3) In spite of the
generality of learning, the current proposals are still ad hoc solutions.
Finally, 4) this is a young topic that still requires a large effort to reach
general solutions able to cope with the problems that affect traditional
approaches.Comment: Submitted to Pattern Recognitio
- …