19,762 research outputs found
Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes
Semantic segmentation, a pixel-level vision task, is developed rapidly by
using convolutional neural networks (CNNs). Training CNNs requires a large
amount of labeled data, but manually annotating data is difficult. For
emancipating manpower, in recent years, some synthetic datasets are released.
However, they are still different from real scenes, which causes that training
a model on the synthetic data (source domain) cannot achieve a good performance
on real urban scenes (target domain). In this paper, we propose a weakly
supervised adversarial domain adaptation to improve the segmentation
performance from synthetic data to real scenes, which consists of three deep
neural networks. To be specific, a detection and segmentation ("DS" for short)
model focuses on detecting objects and predicting segmentation map; a
pixel-level domain classifier ("PDC" for short) tries to distinguish the image
features from which domains; an object-level domain classifier ("ODC" for
short) discriminates the objects from which domains and predicts the objects
classes. PDC and ODC are treated as the discriminators, and DS is considered as
the generator. By adversarial learning, DS is supposed to learn
domain-invariant features. In experiments, our proposed method yields the new
record of mIoU metric in the same problem.Comment: To appear at TI
Human Visual Understanding for Cognition and Manipulation -- A primer for the roboticist
Robotic research is often built on approaches that are motivated by insights
from self-examination of how we interface with the world. However, given
current theories about human cognition and sensory processing, it is reasonable
to assume that the internal workings of the brain are separate from how we
interface with the world and ourselves. To amend some of these misconceptions
arising from self-examination this article reviews human visual understanding
for cognition and action, specifically manipulation. Our focus is on
identifying overarching principles such as the separation into visual
processing for action and cognition, hierarchical processing of visual input,
and the contextual and anticipatory nature of visual processing for action. We
also provide a rudimentary exposition of previous theories about visual
understanding that shows how self-examination can lead down the wrong path. Our
hope is that the article will provide insights for the robotic researcher that
can help them navigate the path of self-examination, give them an overview of
current theories about human visual processing, as well as provide a source for
further relevant reading.Comment: 17 pages, 8 figure
Unsupervised Construction of Human Body Models Using Principles of Organic Computing
Unsupervised learning of a generalizable model of the visual appearance of
humans from video data is of major importance for computing systems interacting
naturally with their users and others. We propose a step towards automatic
behavior understanding by integrating principles of Organic Computing into the
posture estimation cycle, thereby relegating the need for human intervention
while simultaneously raising the level of system autonomy. The system extracts
coherent motion from moving upper bodies and autonomously decides about limbs
and their possible spatial relationships. The models from many videos are
integrated into meta-models, which show good generalization to different
individuals, backgrounds, and attire. These models allow robust interpretation
of single video frames without temporal continuity and posture mimicking by an
android robot
Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach
In unsupervised domain adaptation, it is widely known that the target domain
error can be provably reduced by having a shared input representation that
makes the source and target domains indistinguishable from each other. Very
recently it has been studied that not just matching the marginal input
distributions, but the alignment of output (class) distributions is also
critical. The latter can be achieved by minimizing the maximum discrepancy of
predictors (classifiers). In this paper, we adopt this principle, but propose a
more systematic and effective way to achieve hypothesis consistency via
Gaussian processes (GP). The GP allows us to define/induce a hypothesis space
of the classifiers from the posterior distribution of the latent random
functions, turning the learning into a simple large-margin posterior separation
problem, far easier to solve than previous approaches based on adversarial
minimax optimization. We formulate a learning objective that effectively pushes
the posterior to minimize the maximum discrepancy. This is further shown to be
equivalent to maximizing margins and minimizing uncertainty of the class
predictions in the target domain, a well-established principle in classical
(semi-)supervised learning. Empirical results demonstrate that our approach is
comparable or superior to the existing methods on several benchmark domain
adaptation datasets
Semi-supervised Domain Adaptation via Minimax Entropy
Contemporary domain adaptation methods are very effective at aligning feature
distributions of source and target domains without any target supervision.
However, we show that these techniques perform poorly when even a few labeled
examples are available in the target. To address this semi-supervised domain
adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach
that adversarially optimizes an adaptive few-shot model. Our base model
consists of a feature encoding network, followed by a classification layer that
computes the features' similarity to estimated prototypes (representatives of
each class). Adaptation is achieved by alternately maximizing the conditional
entropy of unlabeled target data with respect to the classifier and minimizing
it with respect to the feature encoder. We empirically demonstrate the
superiority of our method over many baselines, including conventional feature
alignment and few-shot methods, setting a new state of the art for SSDA.Comment: accepted to ICCV2019. ICCV paper versio
Diverse Image-to-Image Translation via Disentangled Representations
Image-to-image translation aims to learn the mapping between two visual
domains. There are two main challenges for many applications: 1) the lack of
aligned training pairs and 2) multiple possible outputs from a single input
image. In this work, we present an approach based on disentangled
representation for producing diverse outputs without paired training images. To
achieve diversity, we propose to embed images onto two spaces: a
domain-invariant content space capturing shared information across domains and
a domain-specific attribute space. Our model takes the encoded content features
extracted from a given input and the attribute vectors sampled from the
attribute space to produce diverse outputs at test time. To handle unpaired
training data, we introduce a novel cross-cycle consistency loss based on
disentangled representations. Qualitative results show that our model can
generate diverse and realistic images on a wide range of tasks without paired
training data. For quantitative comparisons, we measure realism with user study
and diversity with a perceptual distance metric. We apply the proposed model to
domain adaptation and show competitive performance when compared to the
state-of-the-art on the MNIST-M and the LineMod datasets.Comment: ECCV 2018 (Oral). Project page: http://vllab.ucmerced.edu/hylee/DRIT/
Code: https://github.com/HsinYingLee/DRIT
Domain Agnostic Learning with Disentangled Representations
Unsupervised model transfer has the potential to greatly improve the
generalizability of deep models to novel domains. Yet the current literature
assumes that the separation of target data into distinct domains is known as a
priori. In this paper, we propose the task of Domain-Agnostic Learning (DAL):
How to transfer knowledge from a labeled source domain to unlabeled data from
arbitrary target domains? To tackle this problem, we devise a novel Deep
Adversarial Disentangled Autoencoder (DADA) capable of disentangling
domain-specific features from class identity. We demonstrate experimentally
that when the target domain labels are unknown, DADA leads to state-of-the-art
performance on several image classification datasets
DiDA: Disentangled Synthesis for Domain Adaptation
Unsupervised domain adaptation aims at learning a shared model for two
related, but not identical, domains by leveraging supervision from a source
domain to an unsupervised target domain. A number of effective domain
adaptation approaches rely on the ability to extract discriminative, yet
domain-invariant, latent factors which are common to both domains. Extracting
latent commonality is also useful for disentanglement analysis, enabling
separation between the common and the domain-specific features of both domains.
In this paper, we present a method for boosting domain adaptation performance
by leveraging disentanglement analysis. The key idea is that by learning to
separately extract both the common and the domain-specific features, one can
synthesize more target domain data with supervision, thereby boosting the
domain adaptation performance. Better common feature extraction, in turn, helps
further improve the disentanglement analysis and disentangled synthesis. We
show that iterating between domain adaptation and disentanglement analysis can
consistently improve each other on several unsupervised domain adaptation
tasks, for various domain adaptation backbone models
Beyond Sharing Weights for Deep Domain Adaptation
The performance of a classifier trained on data coming from a specific domain
typically degrades when applied to a related but different one. While
annotating many samples from the new domain would address this issue, it is
often too expensive or impractical. Domain Adaptation has therefore emerged as
a solution to this problem; It leverages annotated data from a source domain,
in which it is abundant, to train a classifier to operate in a target domain,
in which it is either sparse or even lacking altogether. In this context, the
recent trend consists of learning deep architectures whose weights are shared
for both domains, which essentially amounts to learning domain invariant
features.
Here, we show that it is more effective to explicitly model the shift from
one domain to the other. To this end, we introduce a two-stream architecture,
where one operates in the source domain and the other in the target domain. In
contrast to other approaches, the weights in corresponding layers are related
but not shared. We demonstrate that this both yields higher accuracy than
state-of-the-art methods on several object recognition and detection tasks and
consistently outperforms networks with shared weights in both supervised and
unsupervised settings
DeceptionNet: Network-Driven Domain Randomization
We present a novel approach to tackle domain adaptation between synthetic and
real data. Instead, of employing "blind" domain randomization, i.e., augmenting
synthetic renderings with random backgrounds or changing illumination and
colorization, we leverage the task network as its own adversarial guide toward
useful augmentations that maximize the uncertainty of the output. To this end,
we design a min-max optimization scheme where a given task competes against a
special deception network to minimize the task error subject to the specific
constraints enforced by the deceiver. The deception network samples from a
family of differentiable pixel-level perturbations and exploits the task
architecture to find the most destructive augmentations. Unlike GAN-based
approaches that require unlabeled data from the target domain, our method
achieves robust mappings that scale well to multiple target distributions from
source data alone. We apply our framework to the tasks of digit recognition on
enhanced MNIST variants, classification and object pose estimation on the
Cropped LineMOD dataset as well as semantic segmentation on the Cityscapes
dataset and compare it to a number of domain adaptation approaches, thereby
demonstrating similar results with superior generalization capabilities.Comment: ICCV 201
- …