16,010 research outputs found

    Shape Consistent 2D Keypoint Estimation under Domain Shift

    Full text link
    Recent unsupervised domain adaptation methods based on deep architectures have shown remarkable performance not only in traditional classification tasks but also in more complex problems involving structured predictions (e.g. semantic segmentation, depth estimation). Following this trend, in this paper we present a novel deep adaptation framework for estimating keypoints under domain shift}, i.e. when the training (source) and the test (target) images significantly differ in terms of visual appearance. Our method seamlessly combines three different components: feature alignment, adversarial training and self-supervision. Specifically, our deep architecture leverages from domain-specific distribution alignment layers to perform target adaptation at the feature level. Furthermore, a novel loss is proposed which combines an adversarial term for ensuring aligned predictions in the output space and a geometric consistency term which guarantees coherent predictions between a target sample and its perturbed version. Our extensive experimental evaluation conducted on three publicly available benchmarks shows that our approach outperforms state-of-the-art domain adaptation methods in the 2D keypoint prediction task

    Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

    Full text link
    In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency can provide effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization. We demonstrate the effectiveness of our approach on real datasets and present experimental results showing that our approach is superior to state-of-the-art general-purpose domain adaptation techniques.Comment: ECCV 201

    Self-Supervised Relative Depth Learning for Urban Scene Understanding

    Full text link
    As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, faraway mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we start by training a deep network, using fully automatic supervision, to predict relative scene depth from single images. The relative depth training images are automatically derived from simple videos of cars moving through a scene, using recent motion segmentation techniques, and no human-provided labels. This proxy task of predicting relative depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. The improvement on the semantic segmentation task is greater than those produced by any other automatically supervised methods. Moreover, for monocular depth estimation, our unsupervised pre-training method even outperforms supervised pre-training with ImageNet. In addition, we demonstrate benefits from learning to predict (unsupervised) relative depth in the specific videos associated with various downstream tasks. We adapt to the specific scenes in those tasks in an unsupervised manner to improve performance. In summary, for semantic segmentation, we present state-of-the-art results among methods that do not use supervised pre-training, and we even exceed the performance of supervised ImageNet pre-trained models for monocular depth estimation, achieving results that are comparable with state-of-the-art methods

    Cross-Domain Depth Estimation Network for 3D Vessel Reconstruction in OCT Angiography

    Get PDF
    Optical Coherence Tomography Angiography (OCTA) has been widely used by ophthalmologists for decision-making due to its superiority in providing caplillary details. Many of the OCTA imaging devices used in clinic provide high-quality 2D en face representations, while their 3D data quality are largely limited by low signal-to-noise ratio and strong projection artifacts, which restrict the performance of depth-resolved 3D analysis. In this paper, we propose a novel 2D-to-3D vessel reconstruction framework based on the 2D en face OCTA images. This framework takes advantage of the detailed 2D OCTA depth map for prediction and thus does not rely on any 3D volumetric data. Based on the data with available vessel depth labels, we first introduce a network with structure constraint blocks to estimate the depth map of blood vessels in other cross-domain en face OCTA data with unavailable labels. Afterwards, a depth adversarial adaptation module is proposed for better unsupervised cross-domain training, since images captured using different devices may suffer from varying image contrast and noise levels. Finally, vessels are reconstructed in 3D space by utilizing the estimated depth map and 2D vascular information. Experimental results demonstrate the effectiveness of our method and its potential to guide subsequent vascular analysis in 3D domain
    corecore