3,274 research outputs found

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding

    Get PDF
    Magnetic Resonance Imaging (MRI) offers high-resolution in vivo imaging and rich functional and anatomical multimodality tissue contrast. In practice, however, there are challenges associated with considerations of scanning costs, patient comfort, and scanning time that constrain how much data can be acquired in clinical or research studies. In this paper, we explore the possibility of generating high-resolution and multimodal images from low-resolution single-modality imagery. We propose the weakly-supervised joint convolutional sparse coding to simultaneously solve the problems of super-resolution (SR) and cross-modality image synthesis. The learning process requires only a few registered multimodal image pairs as the training set. Additionally, the quality of the joint dictionary learning can be improved using a larger set of unpaired images1. To combine unpaired data from different image resolutions/modalities, a hetero-domain image alignment term is proposed. Local image neighborhoods are naturally preserved by operating on the whole image domain (as opposed to image patches) and using joint convolutional sparse coding. The paired images are enhanced in the joint learning process with unpaired data and an additional maximum mean discrepancy term, which minimizes the dissimilarity between their feature distributions. Experiments show that the proposed method outperforms state-of-the-art techniques on both SR reconstruction and simultaneous SR and cross-modality synthesis

    Cross-Modality Feature Learning for Three-Dimensional Brain Image Synthesis

    Get PDF
    Multi-modality medical imaging is increasingly used for comprehensive assessment of complex diseases in either diagnostic examinations or as part of medical research trials. Different imaging modalities provide complementary information about living tissues. However, multi-modal examinations are not always possible due to adversary factors such as patient discomfort, increased cost, prolonged scanning time and scanner unavailability. In addition, in large imaging studies, incomplete records are not uncommon owing to image artifacts, data corruption or data loss, which compromise the potential of multi-modal acquisitions. Moreover, independently of how well an imaging system is, the performance of the imaging equipment usually comes to a certain limit through different physical devices. Additional interferences arise (particularly for medical imaging systems), for example, limited acquisition times, sophisticated and costly equipment and patients with severe medical conditions, which also cause image degradation. The acquisitions can be considered as the degraded version of the original high-quality images. In this dissertation, we explore the problems of image super-resolution and cross-modality synthesis for one Magnetic Resonance Imaging (MRI) modality from an image of another MRI modality of the same subject using an image synthesis framework for reconstructing the missing/complex modality data. We develop models and techniques that allow us to connect the domain of source modality data and the domain of target modality data, enabling transformation between elements of the two domains. In particular, we first introduce the models that project both source modality data and target modality data into a common multi-modality feature space in a supervised setting. This common space then allows us to connect cross-modality features that depict a relationship between each other, and we can impose the learned association function that synthesizes any target modality image. Moreover, we develop a weakly-supervised method that takes a few registered multi-modality image pairs as training data and generates the desired modality data without being constrained a large number of multi-modality images collection of well-processed (\textit{e.g.}, skull-stripped and strictly registered) brain data. Finally, we propose an approach that provides a generic way of learning a dual mapping between source and target domains while considering both visually high-fidelity synthesis and task-practicability. We demonstrate that this model can be used to take any arbitrary modality and efficiently synthesize the desirable modality data in an unsupervised manner. We show that these proposed models advance the state-of-the-art on image super-resolution and cross-modality synthesis tasks that need jointly processing of multi-modality images and that we can design the algorithms in ways to generate the practically beneficial data to medical image analysis

    Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

    Full text link
    Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.Comment: Under Revie

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly
    • …
    corecore