808 research outputs found

    Unsupervised 3D Pose Estimation with Geometric Self-Supervision

    Full text link
    We present an unsupervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. During training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new "synthetic" 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can define self-consistency loss both in 3D and in 2D. The training can thus be self supervised by exploiting the geometric self-consistency of the lift-reproject-lift process. We show that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses. Additionally, to learn from 2D poses "in the wild", we train an unsupervised 2D domain adapter network to allow for an expansion of 2D data. This improves results and demonstrates the usefulness of 2D pose data for unsupervised 3D lifting. Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach improves upon the previous unsupervised methods by 30% and outperforms many weakly supervised approaches that explicitly use 3D data

    3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling

    Full text link
    For monocular depth estimation, acquiring ground truths for real data is not easy, and thus domain adaptation methods are commonly adopted using the supervised synthetic data. However, this may still incur a large domain gap due to the lack of supervision from the real data. In this paper, we develop a domain adaptation framework via generating reliable pseudo ground truths of depth from real data to provide direct supervisions. Specifically, we propose two mechanisms for pseudo-labeling: 1) 2D-based pseudo-labels via measuring the consistency of depth predictions when images are with the same content but different styles; 2) 3D-aware pseudo-labels via a point cloud completion network that learns to complete the depth values in the 3D space, thus providing more structural information in a scene to refine and generate more reliable pseudo-labels. In experiments, we show that our pseudo-labeling methods improve depth estimation in various settings, including the usage of stereo pairs during training. Furthermore, the proposed method performs favorably against several state-of-the-art unsupervised domain adaptation approaches in real-world datasets.Comment: Accepted in ECCV 2022. Project page: https://ccc870206.github.io/3D-PL

    MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation

    Full text link
    Acquiring labeled 6D poses from real images is an expensive and time-consuming task. Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap. To mitigate this degradation, we propose a practical self-supervised domain adaptation approach that takes advantage of real RGB(-D) data without needing real pose labels. We first pre-train the model with synthetic RGB images and then utilize real RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is self-supervised by the RGB-based pose-aware consistency and the depth-guided object distance pseudo-label, which does not require the time-consuming online differentiable rendering. We build our domain adaptation method based on the recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We experimentally demonstrate that our method achieves comparable performance against its fully-supervised counterpart while outperforming existing state-of-the-art approaches.Comment: SCIA202

    Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy

    Get PDF
    Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy. More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy

    SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments

    Full text link
    Different environments pose a great challenge to the outdoor robust visual perception for long-term autonomous driving and the generalization of learning-based algorithms on different environmental effects is still an open problem. Although monocular depth prediction has been well studied recently, there is few work focusing on the robust learning-based depth prediction across different environments, e.g. changing illumination and seasons, owing to the lack of such a multi-environment real-world dataset and benchmark. To this end, the first cross-season monocular depth prediction dataset and benchmark SeasonDepth is built based on CMU Visual Localization dataset. To benchmark the depth estimation performance under different environments, we investigate representative and recent state-of-the-art open-source supervised, self-supervised and domain adaptation depth prediction methods from KITTI benchmark using several newly-formulated metrics. Through extensive experimental evaluation on the proposed dataset, the influence of multiple environments on performance and robustness is analyzed qualitatively and quantitatively, showing that the long-term monocular depth prediction is still challenging even with fine-tuning. We further give promising avenues that self-supervised training and stereo geometry constraint help to enhance the robustness to changing environments. The dataset is available on https://seasondepth.github.io, and benchmark toolkit is available on https://github.com/SeasonDepth/SeasonDepth.Comment: 19 pages, 13 figure
    corecore