808 research outputs found
Unsupervised 3D Pose Estimation with Geometric Self-Supervision
We present an unsupervised learning approach to recover 3D human pose from 2D
skeletal joints extracted from a single image. Our method does not require any
multi-view image data, 3D skeletons, correspondences between 2D-3D points, or
use previously learned 3D priors during training. A lifting network accepts 2D
landmarks as inputs and generates a corresponding 3D skeleton estimate. During
training, the recovered 3D skeleton is reprojected on random camera viewpoints
to generate new "synthetic" 2D poses. By lifting the synthetic 2D poses back to
3D and re-projecting them in the original camera view, we can define
self-consistency loss both in 3D and in 2D. The training can thus be self
supervised by exploiting the geometric self-consistency of the
lift-reproject-lift process. We show that self-consistency alone is not
sufficient to generate realistic skeletons, however adding a 2D pose
discriminator enables the lifter to output valid 3D poses. Additionally, to
learn from 2D poses "in the wild", we train an unsupervised 2D domain adapter
network to allow for an expansion of 2D data. This improves results and
demonstrates the usefulness of 2D pose data for unsupervised 3D lifting.
Results on Human3.6M dataset for 3D human pose estimation demonstrate that our
approach improves upon the previous unsupervised methods by 30% and outperforms
many weakly supervised approaches that explicitly use 3D data
3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling
For monocular depth estimation, acquiring ground truths for real data is not
easy, and thus domain adaptation methods are commonly adopted using the
supervised synthetic data. However, this may still incur a large domain gap due
to the lack of supervision from the real data. In this paper, we develop a
domain adaptation framework via generating reliable pseudo ground truths of
depth from real data to provide direct supervisions. Specifically, we propose
two mechanisms for pseudo-labeling: 1) 2D-based pseudo-labels via measuring the
consistency of depth predictions when images are with the same content but
different styles; 2) 3D-aware pseudo-labels via a point cloud completion
network that learns to complete the depth values in the 3D space, thus
providing more structural information in a scene to refine and generate more
reliable pseudo-labels. In experiments, we show that our pseudo-labeling
methods improve depth estimation in various settings, including the usage of
stereo pairs during training. Furthermore, the proposed method performs
favorably against several state-of-the-art unsupervised domain adaptation
approaches in real-world datasets.Comment: Accepted in ECCV 2022. Project page:
https://ccc870206.github.io/3D-PL
MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation
Acquiring labeled 6D poses from real images is an expensive and
time-consuming task. Though massive amounts of synthetic RGB images are easy to
obtain, the models trained on them suffer from noticeable performance
degradation due to the synthetic-to-real domain gap. To mitigate this
degradation, we propose a practical self-supervised domain adaptation approach
that takes advantage of real RGB(-D) data without needing real pose labels. We
first pre-train the model with synthetic RGB images and then utilize real
RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is
self-supervised by the RGB-based pose-aware consistency and the depth-guided
object distance pseudo-label, which does not require the time-consuming online
differentiable rendering. We build our domain adaptation method based on the
recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We
experimentally demonstrate that our method achieves comparable performance
against its fully-supervised counterpart while outperforming existing
state-of-the-art approaches.Comment: SCIA202
Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy
Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy.
More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy
SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments
Different environments pose a great challenge to the outdoor robust visual
perception for long-term autonomous driving and the generalization of
learning-based algorithms on different environmental effects is still an open
problem. Although monocular depth prediction has been well studied recently,
there is few work focusing on the robust learning-based depth prediction across
different environments, e.g. changing illumination and seasons, owing to the
lack of such a multi-environment real-world dataset and benchmark. To this end,
the first cross-season monocular depth prediction dataset and benchmark
SeasonDepth is built based on CMU Visual Localization dataset. To benchmark the
depth estimation performance under different environments, we investigate
representative and recent state-of-the-art open-source supervised,
self-supervised and domain adaptation depth prediction methods from KITTI
benchmark using several newly-formulated metrics. Through extensive
experimental evaluation on the proposed dataset, the influence of multiple
environments on performance and robustness is analyzed qualitatively and
quantitatively, showing that the long-term monocular depth prediction is still
challenging even with fine-tuning. We further give promising avenues that
self-supervised training and stereo geometry constraint help to enhance the
robustness to changing environments. The dataset is available on
https://seasondepth.github.io, and benchmark toolkit is available on
https://github.com/SeasonDepth/SeasonDepth.Comment: 19 pages, 13 figure
- …