23,200 research outputs found

    Learning generalizable and transferable representations across domains and modalities

    Get PDF
    While deep neural networks attain state-of-the-art performance for computer vision tasks with the help of massive supervised datasets, it is usually assumed that all train and test examples are drawn independently from the same distribution. However, in real-world applications, dataset bias and domain shift violate this assumption. Test data can be from different domains represented by different distributions, which can seriously affect the model performance. Thus, learning generalizable and transferable representations is important to make a model robust to many different types of distributional shift. Domain transfer such as Domain Adaptation (DA) and Domain Generalization (DG) have been proposed to learn generalizable and transferable features across domains. Domain transfer consists of two steps: 1) pre-training, where a model is first pre-trained on an upstream task with a massive supervised dataset, e.g., ImageNet, and 2) transfer (adaptation), where the model is fine-tuned on downstream multi-domain data. In this thesis, we highlight the limitations of current domain transfer approaches and relax the limitations to produce more practical and diverse domain transfer methods. To be specific, we study: 1) Cross-Domain Self-supervised Learning for Domain Adaptation. Prior DA methods use ImageNet pre-trained models as a weight initialization (i.e., pre-training stage). However, the downstream data can be very different from that of ImageNet. Previous domain adaptation approaches assume there are many labeled data in the source domain. Some applications (e.g., Medical Imaging) may not have enough source labels. We explore the problem of few-shot domain adaptation where we only have a few source labels. In addition, we propose cross-domain self-supervised pre-training, which uses only unlabeled multi-domain data. We show that our method significantly boosts the performance of diverse domain transfer tasks. 2) Pre-training for Domain Adaptation. While many DA and DG methods have been proposed and studied extensively in prior work, little attention has been paid to pre-training for domain transfer. We provide comprehensive experiments and an in-depth analysis of pre-training in terms of network architectures, datasets, and loss functions. Finally, we observe significant improvements from the modern pre-training and propose to modernize the current evaluation protocols. 3) Multimodal Representation Learning for Domain Adaptation. We devise self-supervised formulations for multimodal domain adaptation where we promote better knowledge transfer by aligning multimodal features. We first explore a language-vision task where we align the features of multiple languages and images. Then, we explore video domain adaptation with RGB and Flow modalities and propose a joint contrastive regularization that interplays among cross-modal and cross-domain features. 4) Domain Adaptive Keypoint Detection. Lastly, we explore domain adaptive keypoint detection tasks (e.g., human and animal pose estimation) which are not well explored in prior work. We propose a unified framework for diverse keypoint detection scenarios, where we can encounter different types of domain shifts. To handle these domain shifts, we propose a multi-level feature alignment using the input-level and output-level cues and show that our method generalizes well to diverse domain adaptive keypoint detection tasks

    Domain Generalization by Solving Jigsaw Puzzles

    Full text link
    Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.Comment: Accepted at CVPR 2019 (oral

    SuperPoint: Self-Supervised Interest Point Detection and Description

    Full text link
    This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.Comment: Camera-ready version for CVPR 2018 Deep Learning for Visual SLAM Workshop (DL4VSLAM2018
    • …
    corecore