186 research outputs found
Deep learning based RGB-D vision tasks
Depth is an important source of information in computer vision. However, depth is
usually discarded in most vision tasks. In this thesis, we study the tasks of estimating depth from single monocular images, and incorporating depth for object detection and semantic segmentation. Recently, a significant number of breakthroughs have been introduced to the vision community by deep convolutional neural networks (CNNs). All of our algorithms in this thesis are built upon deep CNNs.
The first part of this thesis addresses the task of incorporating depth for object detection and semantic segmentation. The aim is to improve the performance of vision tasks that are only based on RGB data. Two approaches for object detection and two approaches for semantic segmentation are presented. These approaches are based on existing depth estimation, object detection and semantic segmentation algorithms.
The second part of this thesis addresses the task of depth estimation. Depth estimation is often formulated as a regression task due to the continuous property of depths. Deep CNNs for depth estimation are trained by iteratively minimizing regression errors between predicted and ground-truth depths. A drawback of regression is that it predicts depths without confidence. In this thesis, we propose to formulate depth estimation as a classification task which naturally predicts depths with confidence. The confidence can be used during training and post-processing. We also propose to exploit ordinal depth relationships from stereo videos to improve the performance of metric depth estimation. By doing so we propose a Relative Depth in Stereo (RDIS) dataset that is densely annotated with relative depths.Thesis (Ph.D.) -- University of Adelaide,School of Computer Science , 201
MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from
Motion (NRSfM). MHR-Net aims to find a set of reasonable reconstructions for a
2D view, and it also selects the most likely reconstruction from the set. To
deal with the challenging unsupervised generation of non-rigid shapes, we
develop a new Deterministic Basis and Stochastic Deformation scheme in MHR-Net.
The non-rigid shape is first expressed as the sum of a coarse shape basis and a
flexible shape deformation, then multiple hypotheses are generated with
uncertainty modeling of the deformation part. MHR-Net is optimized with
reprojection loss on the basis and the best hypothesis. Furthermore, we design
a new Procrustean Residual Loss, which reduces the rigid rotations between
similar shapes and further improves the performance. Experiments show that
MHR-Net achieves state-of-the-art reconstruction accuracy on Human3.6M, SURREAL
and 300-VW datasets.Comment: Accepted to ECCV 202
Deep Learning-Based Human Pose Estimation: A Survey
Human pose estimation aims to locate the human body parts and build human
body representation (e.g., body skeleton) from input data such as images and
videos. It has drawn increasing attention during the past decade and has been
utilized in a wide range of applications including human-computer interaction,
motion analysis, augmented reality, and virtual reality. Although the recently
developed deep learning-based solutions have achieved high performance in human
pose estimation, there still remain challenges due to insufficient training
data, depth ambiguities, and occlusion. The goal of this survey paper is to
provide a comprehensive review of recent deep learning-based solutions for both
2D and 3D pose estimation via a systematic analysis and comparison of these
solutions based on their input data and inference procedures. More than 240
research papers since 2014 are covered in this survey. Furthermore, 2D and 3D
human pose estimation datasets and evaluation metrics are included.
Quantitative performance comparisons of the reviewed methods on popular
datasets are summarized and discussed. Finally, the challenges involved,
applications, and future research directions are concluded. We also provide a
regularly updated project page: \url{https://github.com/zczcwh/DL-HPE
DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model
Thanks to the development of 2D keypoint detectors, monocular 3D human pose
estimation (HPE) via 2D-to-3D uplifting approaches have achieved remarkable
improvements. Still, monocular 3D HPE is a challenging problem due to the
inherent depth ambiguities and occlusions. To handle this problem, many
previous works exploit temporal information to mitigate such difficulties.
However, there are many real-world applications where frame sequences are not
accessible. This paper focuses on reconstructing a 3D pose from a single 2D
keypoint detection. Rather than exploiting temporal information, we alleviate
the depth ambiguity by generating multiple 3D pose candidates which can be
mapped to an identical 2D keypoint. We build a novel diffusion-based framework
to effectively sample diverse 3D poses from an off-the-shelf 2D detector. By
considering the correlation between human joints by replacing the conventional
denoising U-Net with graph convolutional network, our approach accomplishes
further performance improvements. We evaluate our method on the widely adopted
Human3.6M and HumanEva-I datasets. Comprehensive experiments are conducted to
prove the efficacy of the proposed method, and they confirm that our model
outperforms state-of-the-art multi-hypothesis 3D HPE methods
- …