16,010 research outputs found
Shape Consistent 2D Keypoint Estimation under Domain Shift
Recent unsupervised domain adaptation methods based on deep architectures
have shown remarkable performance not only in traditional classification tasks
but also in more complex problems involving structured predictions (e.g.
semantic segmentation, depth estimation). Following this trend, in this paper
we present a novel deep adaptation framework for estimating keypoints under
domain shift}, i.e. when the training (source) and the test (target) images
significantly differ in terms of visual appearance. Our method seamlessly
combines three different components: feature alignment, adversarial training
and self-supervision. Specifically, our deep architecture leverages from
domain-specific distribution alignment layers to perform target adaptation at
the feature level. Furthermore, a novel loss is proposed which combines an
adversarial term for ensuring aligned predictions in the output space and a
geometric consistency term which guarantees coherent predictions between a
target sample and its perturbed version. Our extensive experimental evaluation
conducted on three publicly available benchmarks shows that our approach
outperforms state-of-the-art domain adaptation methods in the 2D keypoint
prediction task
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
In this paper, we introduce a novel unsupervised domain adaptation technique
for the task of 3D keypoint prediction from a single depth scan or image. Our
key idea is to utilize the fact that predictions from different views of the
same or similar objects should be consistent with each other. Such view
consistency can provide effective regularization for keypoint prediction on
unlabeled instances. In addition, we introduce a geometric alignment term to
regularize predictions in the target domain. The resulting loss function can be
effectively optimized via alternating minimization. We demonstrate the
effectiveness of our approach on real datasets and present experimental results
showing that our approach is superior to state-of-the-art general-purpose
domain adaptation techniques.Comment: ECCV 201
Self-Supervised Relative Depth Learning for Urban Scene Understanding
As an agent moves through the world, the apparent motion of scene elements is
(usually) inversely proportional to their depth. It is natural for a learning
agent to associate image patterns with the magnitude of their displacement over
time: as the agent moves, faraway mountains don't move much; nearby trees move
a lot. This natural relationship between the appearance of objects and their
motion is a rich source of information about the world. In this work, we start
by training a deep network, using fully automatic supervision, to predict
relative scene depth from single images. The relative depth training images are
automatically derived from simple videos of cars moving through a scene, using
recent motion segmentation techniques, and no human-provided labels. This proxy
task of predicting relative depth from a single image induces features in the
network that result in large improvements in a set of downstream tasks
including semantic segmentation, joint road segmentation and car detection, and
monocular (absolute) depth estimation, over a network trained from scratch. The
improvement on the semantic segmentation task is greater than those produced by
any other automatically supervised methods. Moreover, for monocular depth
estimation, our unsupervised pre-training method even outperforms supervised
pre-training with ImageNet. In addition, we demonstrate benefits from learning
to predict (unsupervised) relative depth in the specific videos associated with
various downstream tasks. We adapt to the specific scenes in those tasks in an
unsupervised manner to improve performance. In summary, for semantic
segmentation, we present state-of-the-art results among methods that do not use
supervised pre-training, and we even exceed the performance of supervised
ImageNet pre-trained models for monocular depth estimation, achieving results
that are comparable with state-of-the-art methods
Cross-Domain Depth Estimation Network for 3D Vessel Reconstruction in OCT Angiography
Optical Coherence Tomography Angiography (OCTA) has been widely used by ophthalmologists for decision-making due to its superiority in providing caplillary details. Many of the OCTA imaging devices used in clinic provide high-quality 2D en face representations, while their 3D data quality are largely limited by low signal-to-noise ratio and strong projection artifacts, which restrict the performance of depth-resolved 3D analysis. In this paper, we propose a novel 2D-to-3D vessel reconstruction framework based on the 2D en face OCTA images. This framework takes advantage of the detailed 2D OCTA depth map for prediction and thus does not rely on any 3D volumetric data. Based on the data with available vessel depth labels, we first introduce a network with structure constraint blocks to estimate the depth map of blood vessels in other cross-domain en face OCTA data with unavailable labels. Afterwards, a depth adversarial adaptation module is proposed for better unsupervised cross-domain training, since images captured using different devices may suffer from varying image contrast and noise levels. Finally, vessels are reconstructed in 3D space by utilizing the estimated depth map and 2D vascular information. Experimental results demonstrate the effectiveness of our method and its potential to guide subsequent vascular analysis in 3D domain
- …