1,062 research outputs found
Improved Techniques for Adversarial Discriminative Domain Adaptation
Adversarial discriminative domain adaptation (ADDA) is an efficient framework
for unsupervised domain adaptation in image classification, where the source
and target domains are assumed to have the same classes, but no labels are
available for the target domain. We investigate whether we can improve
performance of ADDA with a new framework and new loss formulations. Following
the framework of semi-supervised GANs, we first extend the discriminator output
over the source classes, in order to model the joint distribution over domain
and task. We thus leverage on the distribution over the source encoder
posteriors (which is fixed during adversarial training) and propose maximum
mean discrepancy (MMD) and reconstruction-based loss functions for aligning the
target encoder distribution to the source domain. We compare and provide a
comprehensive analysis of how our framework and loss formulations extend over
simple multi-class extensions of ADDA and other discriminative variants of
semi-supervised GANs. In addition, we introduce various forms of regularization
for stabilizing training, including treating the discriminator as a denoising
autoencoder and regularizing the target encoder with source examples to reduce
overfitting under a contraction mapping (i.e., when the target per-class
distributions are contracting during alignment with the source). Finally, we
validate our framework on standard domain adaptation datasets, such as SVHN and
MNIST. We also examine how our framework benefits recognition problems based on
modalities that lack training data, by introducing and evaluating on a
neuromorphic vision sensing (NVS) sign language recognition dataset, where the
source and target domains constitute emulated and real neuromorphic spike
events respectively. Our results on all datasets show that our proposal
competes or outperforms the state-of-the-art in unsupervised domain adaptation.Comment: To appear in IEEE Transactions on Image Processin
Stable Distribution Alignment Using the Dual of the Adversarial Distance
Methods that align distributions by minimizing an adversarial distance
between them have recently achieved impressive results. However, these
approaches are difficult to optimize with gradient descent and they often do
not converge well without careful hyperparameter tuning and proper
initialization. We investigate whether turning the adversarial min-max problem
into an optimization problem by replacing the maximization part with its dual
improves the quality of the resulting alignment and explore its connections to
Maximum Mean Discrepancy. Our empirical results suggest that using the dual
formulation for the restricted family of linear discriminators results in a
more stable convergence to a desirable solution when compared with the
performance of a primal min-max GAN-like objective and an MMD objective under
the same restrictions. We test our hypothesis on the problem of aligning two
synthetic point clouds on a plane and on a real-image domain adaptation problem
on digits. In both cases, the dual formulation yields an iterative procedure
that gives more stable and monotonic improvement over time.Comment: ICLR 2018 Conference Invite to Worksho
Domain Generalization via Universal Non-volume Preserving Models
Recognition across domains has recently become an active topic in the
research community. However, it has been largely overlooked in the problem of
recognition in new unseen domains. Under this condition, the delivered deep
network models are unable to be updated, adapted, or fine-tuned. Therefore,
recent deep learning techniques, such as domain adaptation, feature
transferring, and fine-tuning, cannot be applied. This paper presents a novel
approach to the problem of domain generalization in the context of deep
learning. The proposed method is evaluated on different datasets in various
problems, i.e. (i) digit recognition on MNIST, SVHN, and MNIST-M, (ii) face
recognition on Extended Yale-B, CMU-PIE and CMU-MPIE, and (iii) pedestrian
recognition on RGB and Thermal image datasets. The experimental results show
that our proposed method consistently improves performance accuracy. It can
also be easily incorporated with any other CNN frameworks within an end-to-end
deep network design for object detection and recognition problems to improve
their performance.Comment: Accepted to Computer and Robot Vision 2020. arXiv admin note:
substantial text overlap with arXiv:1812.0340
Attentive Adversarial Learning for Domain-Invariant Training
Adversarial domain-invariant training (ADIT) proves to be effective in
suppressing the effects of domain variability in acoustic modeling and has led
to improved performance in automatic speech recognition (ASR). In ADIT, an
auxiliary domain classifier takes in equally-weighted deep features from a deep
neural network (DNN) acoustic model and is trained to improve their
domain-invariance by optimizing an adversarial loss function. In this work, we
propose an attentive ADIT (AADIT) in which we advance the domain classifier
with an attention mechanism to automatically weight the input deep features
according to their importance in domain classification. With this attentive
re-weighting, AADIT can focus on the domain normalization of phonetic
components that are more susceptible to domain variability and generates deep
features with improved domain-invariance and senone-discriminativity over ADIT.
Most importantly, the attention block serves only as an external component to
the DNN acoustic model and is not involved in ASR, so AADIT can be used to
improve the acoustic modeling with any DNN architectures. More generally, the
same methodology can improve any adversarial learning system with an auxiliary
discriminator. Evaluated on CHiME-3 dataset, the AADIT achieves 13.6% and 9.3%
relative WER improvements, respectively, over a multi-conditional model and a
strong ADIT baseline.Comment: 5 pages, 1 figure, ICASSP 201
Adversarial Deep Learning in EEG Biometrics
Deep learning methods for person identification based on
electroencephalographic (EEG) brain activity encounters the problem of
exploiting the temporally correlated structures or recording session specific
variability within EEG. Furthermore, recent methods have mostly trained and
evaluated based on single session EEG data. We address this problem from an
invariant representation learning perspective. We propose an adversarial
inference approach to extend such deep learning models to learn
session-invariant person-discriminative representations that can provide
robustness in terms of longitudinal usability. Using adversarial learning
within a deep convolutional network, we empirically assess and show
improvements with our approach based on longitudinally collected EEG data for
person identification from half-second EEG epochs.Comment: Accepted for publication by IEEE Signal Processing Letter
Virtual Mixup Training for Unsupervised Domain Adaptation
We study the problem of unsupervised domain adaptation which aims to adapt
models trained on a labeled source domain to a completely unlabeled target
domain. Recently, the cluster assumption has been applied to unsupervised
domain adaptation and achieved strong performance. One critical factor in
successful training of the cluster assumption is to impose the
locally-Lipschitz constraint to the model. Existing methods only impose the
locally-Lipschitz constraint around the training points while miss the other
areas, such as the points in-between training data. In this paper, we address
this issue by encouraging the model to behave linearly in-between training
points. We propose a new regularization method called Virtual Mixup Training
(VMT), which is able to incorporate the locally-Lipschitz constraint to the
areas in-between training data. Unlike the traditional mixup model, our method
constructs the combination samples without using the label information,
allowing it to apply to unsupervised domain adaptation. The proposed method is
generic and can be combined with most existing models such as the recent
state-of-the-art model called VADA. Extensive experiments demonstrate that VMT
significantly improves the performance of VADA on six domain adaptation
benchmark datasets. For the challenging task of adapting MNIST to SVHN, VMT can
improve the accuracy of VADA by over 30\%. Code is available at
\url{https://github.com/xudonmao/VMT}
Recent Progresses in Deep Learning based Acoustic Models (Updated)
In this paper, we summarize recent progresses made in deep learning based
acoustic models and the motivation and insights behind the surveyed techniques.
We first discuss acoustic models that can effectively exploit variable-length
contextual information, such as recurrent neural networks (RNNs), convolutional
neural networks (CNNs), and their various combination with other models. We
then describe acoustic models that are optimized end-to-end with emphasis on
feature representations learned jointly with rest of the system, the
connectionist temporal classification (CTC) criterion, and the attention-based
sequence-to-sequence model. We further illustrate robustness issues in speech
recognition systems, and discuss acoustic model adaptation, speech enhancement
and separation, and robust training strategies. We also cover modeling
techniques that lead to more efficient decoding and discuss possible future
directions in acoustic model research.Comment: This is an updated version with latest literature until ICASSP2018 of
the paper: Dong Yu and Jinyu Li, "Recent Progresses in Deep Learning based
Acoustic Models," vol.4, no.3, IEEE/CAA Journal of Automatica Sinica, 201
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Unsupervised Domain Adaptation for Learning Eye Gaze from a Million Synthetic Images: An Adversarial Approach
With contemporary advancements of graphics engines, recent trend in deep
learning community is to train models on automatically annotated simulated
examples and apply on real data during test time. This alleviates the burden of
manual annotation. However, there is an inherent difference of distributions
between images coming from graphics engine and real world. Such domain
difference deteriorates test time performances of models trained on synthetic
examples. In this paper we address this issue with unsupervised adversarial
feature adaptation across synthetic and real domain for the special use case of
eye gaze estimation which is an essential component for various downstream HCI
tasks. We initially learn a gaze estimator on annotated synthetic samples
rendered from a 3D game engine and then adapt the features of unannotated real
samples via a zero-sum minmax adversarial game against a domain discriminator
following the recent paradigm of generative adversarial networks. Such
adversarial adaptation forces features of both domains to be indistinguishable
which enables us to use regression models trained on synthetic domain to be
used on real samples. On the challenging MPIIGaze real life dataset, we
outperform recent fully supervised methods trained on manually annotated real
samples by appreciable margins and also achieve 13\% more relative gain after
adaptation compared to the current benchmark method of SimGA
Joint auto-encoders: a flexible multi-task learning framework
The incorporation of prior knowledge into learning is essential in achieving
good performance based on small noisy samples. Such knowledge is often
incorporated through the availability of related data arising from domains and
tasks similar to the one of current interest. Ideally one would like to allow
both the data for the current task and for previous related tasks to
self-organize the learning system in such a way that commonalities and
differences between the tasks are learned in a data-driven fashion. We develop
a framework for learning multiple tasks simultaneously, based on sharing
features that are common to all tasks, achieved through the use of a modular
deep feedforward neural network consisting of shared branches, dealing with the
common features of all tasks, and private branches, learning the specific
unique aspects of each task. Once an appropriate weight sharing architecture
has been established, learning takes place through standard algorithms for
feedforward networks, e.g., stochastic gradient descent and its variations. The
method deals with domain adaptation and multi-task learning in a unified
fashion, and can easily deal with data arising from different types of sources.
Numerical experiments demonstrate the effectiveness of learning in domain
adaptation and transfer learning setups, and provide evidence for the flexible
and task-oriented representations arising in the network
- …