6,643 research outputs found
Image to Image Translation for Domain Adaptation
We propose a general framework for unsupervised domain adaptation, which
allows deep neural networks trained on a source domain to be tested on a
different target domain without requiring any training annotations in the
target domain. This is achieved by adding extra networks and losses that help
regularize the features extracted by the backbone encoder network. To this end
we propose the novel use of the recently proposed unpaired image-toimage
translation framework to constrain the features extracted by the encoder
network. Specifically, we require that the features extracted are able to
reconstruct the images in both domains. In addition we require that the
distribution of features extracted from images in the two domains are
indistinguishable. Many recent works can be seen as specific cases of our
general framework. We apply our method for domain adaptation between MNIST,
USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in
classification tasks, and also between GTA5 and Cityscapes datasets for a
segmentation task. We demonstrate state of the art performance on each of these
datasets
DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks
3D scene understanding is important for robots to interact with the 3D world
in a meaningful way. Most previous works on 3D scene understanding focus on
recognizing geometrical or semantic properties of the scene independently. In
this work, we introduce Data Associated Recurrent Neural Networks (DA-RNNs), a
novel framework for joint 3D scene mapping and semantic labeling. DA-RNNs use a
new recurrent neural network architecture for semantic labeling on RGB-D
videos. The output of the network is integrated with mapping techniques such as
KinectFusion in order to inject semantic information into the reconstructed 3D
scene. Experiments conducted on a real world dataset and a synthetic dataset
with RGB-D videos demonstrate the ability of our method in semantic 3D scene
mapping.Comment: Published in RSS 201
Distance Guided Channel Weighting for Semantic Segmentation
Recent works have achieved great success in improving the performance of
multiple computer vision tasks by capturing features with a high channel number
utilizing deep neural networks. However, many channels of extracted features
are not discriminative and contain a lot of redundant information. In this
paper, we address above issue by introducing the Distance Guided Channel
Weighting (DGCW) Module. The DGCW module is constructed in a pixel-wise context
extraction manner, which enhances the discriminativeness of features by
weighting different channels of each pixel's feature vector when modeling its
relationship with other pixels. It can make full use of the high-discriminative
information while ignore the low-discriminative information containing in
feature maps, as well as capture the long-range dependencies. Furthermore, by
incorporating the DGCW module with a baseline segmentation network, we propose
the Distance Guided Channel Weighting Network (DGCWNet). We conduct extensive
experiments to demonstrate the effectiveness of DGCWNet. In particular, it
achieves 81.6% mIoU on Cityscapes with only fine annotated data for training,
and also gains satisfactory performance on another two semantic segmentation
datasets, i.e. Pascal Context and ADE20K. Code will be available soon at
https://github.com/LanyunZhu/DGCWNet
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
- …