8,333 research outputs found

    Deep Learning based Domain Adaptation

    Get PDF
    Recent advancements in Deep Learning (DL) has helped researchers achieve fascinating results in various areas of Machine Learning (ML) and Computer Vision (CV). Starting with the ingenious approach of [Krizhevsky et al., 2012a] where they have utilized processing powers of graphical processing units (GPU) to make training large networks a viable choice in terms of training time, DL has had its place in different ML and CV problems over the years since. Object detection and semantic segmentation [Girshick et al., 2014a; Girshick, 2015; Ren et al., 2015], image super resolution [Dong et al., 2015], action recognition [Simonyan and Zisserman, 2014a] etc. are few examples to that. Over years, many more new and powerful DL architectures have been proposed: VGG [Simonyan and Zisserman, 2014b], GoogleNet [Szegedy et al., 2015], ResNet [He et al., 2016] are examples to most commonly used network architectures in the literature. Our focus is on the specific task of Supervised Domain Adaptation (SDA) using Deep Learning. SDA is a type of domain adaptation where target and source domains contain annotated data. Firstly, we look at SDA as a domain alignment problem. We propose a mixture of alignment approach based on second- or higher-order scatter statistics between source and target domains. Although they are different, each class has two distinctive representation in source and target domains. Proposed mixture alignment approach aims to reduce within class scatters to align same classes from source and target while maintaining between-class separation. We design and construct a two stream Convolutional Neural Network (CNN) where one stream receives source data and second one receives the target with matching classes to implement within class alignment. We achieve end-to-end training of our two-stream network together with alignment losses. Next, we propose a new dataset called Open Museum Identification Challenge (Open MIC) for SDA research. Office dataset [Saenko et al., 2010a] is commonly used in SDA literature. But one main drawback of this dataset is that results have saturated, reaching 90+% accuracy. Limited number of images is one of the main causes of high accuracy results. Open MIC aims to provide a large dataset for SDA while providing challenging tasks to be addressed. We also extend our mixture of alignment loss from frobenius norm distance to Bregman divergences and the Riemannian metric to learn the alignment in different feature spaces. In the next study, we propose a new representation to encode 3D body skeleton data into texture like images by using kernel methods for Action Recognition problem. We utilize these representations in our SDA two stream CNN pipeline. We improve our mixture of alignment losses to work with partially overlapping datasets to let us use other datasets available for Action Recognition as additional source domain even if they only partially overlap with the target set. Finally, we move to a more challenging domain adaptation problem: Multimodal Conversation Systems. Multimodal Dialogue dataset (MMD) [Saha et al., 2018] provides dialogues between a shopper and retail agent. In these dialogues, retail agent may also answer with specific retail items such as cloths, shoes etc. Hence flow of the conversation is a multimodal setting where utterances can contain both text and image modalities. Two level RNN encoders are used to encode a given context of utterances. We propose a new approach to this problem by adapting additional data from external domains. For improving text generating capabilities of the model, we utilize French translation of the target sentences as an additional output target. For improving image ranking capabilities of the model, we utilize an external dataset and find nearest neighbors of target positive and negative images. We set up new encoding methods for these nearest neighbors for assigning them to correct target class, positive or negative

    Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine

    Full text link
    Artificial intelligence (AI) continues to transform data analysis in many domains. Progress in each domain is driven by a growing body of annotated data, increased computational resources, and technological innovations. In medicine, the sensitivity of the data, the complexity of the tasks, the potentially high stakes, and a requirement of accountability give rise to a particular set of challenges. In this review, we focus on three key methodological approaches that address some of the particular challenges in AI-driven medical decision making. (1) Explainable AI aims to produce a human-interpretable justification for each output. Such models increase confidence if the results appear plausible and match the clinicians expectations. However, the absence of a plausible explanation does not imply an inaccurate model. Especially in highly non-linear, complex models that are tuned to maximize accuracy, such interpretable representations only reflect a small portion of the justification. (2) Domain adaptation and transfer learning enable AI models to be trained and applied across multiple domains. For example, a classification task based on images acquired on different acquisition hardware. (3) Federated learning enables learning large-scale models without exposing sensitive personal health information. Unlike centralized AI learning, where the centralized learning machine has access to the entire training data, the federated learning process iteratively updates models across multiple sites by exchanging only parameter updates, not personal health data. This narrative review covers the basic concepts, highlights relevant corner-stone and state-of-the-art research in the field, and discusses perspectives.Comment: This paper is accepted in IEEE CAA Journal of Automatica Sinica, Nov. 10 202

    Few-shot Learning with Multi-scale Self-supervision

    Full text link
    Learning concepts from the limited number of datapoints is a challenging task usually addressed by the so-called one- or few-shot learning. Recently, an application of second-order pooling in few-shot learning demonstrated its superior performance due to the aggregation step handling varying image resolutions without the need of modifying CNNs to fit to specific image sizes, yet capturing highly descriptive co-occurrences. However, using a single resolution per image (even if the resolution varies across a dataset) is suboptimal as the importance of image contents varies across the coarse-to-fine levels depending on the object and its class label e. g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized texture patterns. Multi-scale representations are popular in image deblurring, super-resolution and image recognition but they have not been investigated in few-shot learning due to its relational nature complicating the use of standard techniques. In this paper, we propose a novel multi-scale relation network based on the properties of second-order pooling to estimate image relations in few-shot setting. To optimize the model, we leverage a scale selector to re-weight scale-wise representations based on their second-order features. Furthermore, we propose to a apply self-supervised scale prediction. Specifically, we leverage an extra discriminator to predict the scale labels and the scale discrepancy between pairs of images. Our model achieves state-of-the-art results on standard few-shot learning datasets
    corecore