4 research outputs found

    MiniMax Entropy Network: Learning Category-Invariant Features for Domain Adaptation

    Full text link
    How to effectively learn from unlabeled data from the target domain is crucial for domain adaptation, as it helps reduce the large performance gap due to domain shift or distribution change. In this paper, we propose an easy-to-implement method dubbed MiniMax Entropy Networks (MMEN) based on adversarial learning. Unlike most existing approaches which employ a generator to deal with domain difference, MMEN focuses on learning the categorical information from unlabeled target samples with the help of labeled source samples. Specifically, we set an unfair multi-class classifier named categorical discriminator, which classifies source samples accurately but be confused about the categories of target samples. The generator learns a common subspace that aligns the unlabeled samples based on the target pseudo-labels. For MMEN, we also provide theoretical explanations to show that the learning of feature alignment reduces domain mismatch at the category level. Experimental results on various benchmark datasets demonstrate the effectiveness of our method over existing state-of-the-art baselines.Comment: 8 pages, 6 figure

    Improved Techniques for Adversarial Discriminative Domain Adaptation

    Get PDF
    Adversarial discriminative domain adaptation (ADDA) is an efficient framework for unsupervised domain adaptation in image classification, where the source and target domains are assumed to have the same classes, but no labels are available for the target domain. We investigate whether we can improve performance of ADDA with a new framework and new loss formulations. Following the framework of semi-supervised GANs, we first extend the discriminator output over the source classes, in order to model the joint distribution over domain and task. We thus leverage on the distribution over the source encoder posteriors (which is fixed during adversarial training) and propose maximum mean discrepancy (MMD) and reconstruction-based loss functions for aligning the target encoder distribution to the source domain. We compare and provide a comprehensive analysis of how our framework and loss formulations extend over simple multi-class extensions of ADDA and other discriminative variants of semi-supervised GANs. In addition, we introduce various forms of regularization for stabilizing training, including treating the discriminator as a denoising autoencoder and regularizing the target encoder with source examples to reduce overfitting under a contraction mapping (i.e., when the target per-class distributions are contracting during alignment with the source). Finally, we validate our framework on standard domain adaptation datasets, such as SVHN and MNIST. We also examine how our framework benefits recognition problems based on modalities that lack training data, by introducing and evaluating on a neuromorphic vision sensing (NVS) sign language recognition dataset, where the source and target domains constitute emulated and real neuromorphic spike events respectively. Our results on all datasets show that our proposal competes or outperforms the state-of-the-art in unsupervised domain adaptation.Comment: To appear in IEEE Transactions on Image Processin

    From Pixels to Spikes: Efficient Multimodal Learning in the Presence of Domain Shift

    Get PDF
    Computer vision aims to provide computers with a conceptual understanding of images or video by learning a high-level representation. This representation is typically derived from the pixel domain (i.e., RGB channels) for tasks such as image classification or action recognition. In this thesis, we explore how RGB inputs can either be pre-processed or supplemented with other compressed visual modalities, in order to improve the accuracy-complexity tradeoff for various computer vision tasks. Beginning with RGB-domain data only, we propose a multi-level, Voronoi based spatial partitioning of images, which are individually processed by a convolutional neural network (CNN), to improve the scale invariance of the embedding. We combine this with a novel and efficient approach for optimal bit allocation within the quantized cell representations. We evaluate this proposal on the content-based image retrieval task, which constitutes finding similar images in a dataset to a given query. We then move to the more challenging domain of action recognition, where a video sequence is classified according to its constituent action. In this case, we demonstrate how the RGB modality can be supplemented with a flow modality, comprising motion vectors extracted directly from the video codec. The motion vectors (MVs) are used both as input to a CNN and as an activity sensor for providing selective macroblock (MB) decoding of RGB frames instead of full-frame decoding. We independently train two CNNs on RGB and MV correspondences and then fuse their scores during inference, demonstrating faster end-to-end processing and competitive classification accuracy to recent work. In order to explore the use of more efficient sensing modalities, we replace the MV stream with a neuromorphic vision sensing (NVS) stream for action recognition. NVS hardware mimics the biological retina and operates with substantially lower power and at significantly higher sampling rates than conventional active pixel sensing (APS) cameras. Due to the lack of training data in this domain, we generate emulated NVS frames directly from consecutive RGB frames and use these to train a teacher-student framework that additionally leverages on the abundance of optical flow training data. In the final part of this thesis, we introduce a novel unsupervised domain adaptation method for further minimizing the domain shift between emulated (source) and real (target) NVS data domains
    corecore