1,518 research outputs found

    Domain Generalization for Medical Image Analysis: A Survey

    Full text link
    Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we include benchmark datasets and applications used to evaluate these approaches and analyze the strengths and weaknesses of various methods, unveiling future research opportunities

    DESIGN OF A USER INTERFACE FOR THE ANALYSIS OF MULTI-MODAL IMAGE REGISTRATION

    Get PDF
    Image registration is the process of spatially aligning two or more images of a scene into a common coordinate system. Research in image registration has yielded a number of rigid and non-rigid image registration methods capable of registering images of a scene between modalities. In addition, techniques of information visualization have been applied to medical image registration research to produce an atlas based image registration method. This method is capable of registration medical images of a same modality between subjects for comparative studies. This thesis aims to extend research in image registration by adding to it the visual encoding of time. The visual encoding of time furthers image registration research by enabling the simultaneous analysis of the spatial and temporal relationships that exist between images. The benefit ofregistering images with respect to both space and time is shown through the development of a software application capable of presenting a time­ space narrative of x-ray images representing a patient’s medical history. This time-space narrative is assembled by performing rigid atlas based image registration on a set of x-ray images and by visually encoding their timestamps to form of an interactive timeline. The atlas based image registration method was selected to ensure that images can be registered to a common coordinate system in cases where images do not overlap. Rigid image registration was assumed to be sufficient to provide the desired visual result. Subsequent to its implementation, an analysis of the measured uncertainty of the image registration method was performed. The error in manual point pair correspondence selection was measured at more than +/- 1.08 pixels under ideal conditions and a method to calculate the unique standard error of each image registration was presented

    Imaging biomarkers extraction and classification for Prion disease

    Get PDF
    Prion diseases are a group of rare neurodegenerative conditions characterised by a high rate of progression and highly heterogeneous phenotypes. Whilst the most common form of prion disease occurs sporadically (sporadic Creutzfeldt-Jakob disease, sCJD), other forms are caused by inheritance of prion protein gene mutations or exposure to prions. To date, there are no accurate imaging biomarkers that can be used to predict the future diagnosis of a subject or to quantify the progression of symptoms over time. Besides, CJD is commonly mistaken for other forms of dementia. Due to the large heterogeneity of phenotypes of prion disease and the lack of a consistent spatial pattern of disease progression, the approaches used to study other types of neurodegenerative diseases are not satisfactory to capture the progression of the human form of prion disease. Using a tailored framework, I extracted quantitative imaging biomarkers for characterisation of patients with Prion diseases. Following the extraction of patient-specific imaging biomarkers from multiple images, I implemented a Gaussian Process approach to correlated symptoms with disease types and stages. The model was used on three different tasks: diagnosis, differential diagnosis and stratification, addressing an unmet need to automatically identify patients with or at risk of developing Prion disease. The work presented in this thesis has been extensively validated in a unique Prion disease cohort, comprising both the inherited and sporadic forms of the disease. The model has shown to be effective in the prediction of this illness. Furthermore, this approach may have used in other disorders with heterogeneous imaging features, being an added value for the understanding of neurodegenerative diseases. Lastly, given the rarity of this disease, I also addressed the issue of missing data and the limitations raised by it. Overall, this work presents progress towards modelling of Prion diseases and which computational methodologies are potentially suitable for its characterisation

    Automatic Reporting of TBI Lesion Location in CT based on Deep Learning and Atlas Mapping

    Get PDF
    Tese de mestrado integrado, Engenharia Biomédica e Biofísica (Biofísica Médica e Fisiologia de Sistemas), 2021, Universidade de Lisboa, Faculdade de CiênciasThe assessment of Computed Tomography (CT) scans for Traumatic Brain Injury (TBI) management remains a time consuming and challenging task for physicians. Computational methods for quantitative lesion segmentation and localisation may increase consistency in diagnosis and prognosis criteria. Our goal was to develop a registration-based tool to accurately localise several lesion classes (i.e., calculate the volume of lesion per brain region), as an extension of the Brain Lesion Analysis and Segmentation Tool for CT (BLAST-CT). Lesions were located by projecting a Magnetic Resonance Imaging (MRI) labelled atlas from the Montreal Neurological Institute (MNI MRI atlas) to a lesion map in native space. We created a CT template to work as an intermediate step between the two imaging spaces, using 182 non-lesioned CT scans and an unbiased iterative registration approach. We then non-linearly registered the parcellated atlas to the CT template, subsequently registering (affine) the result to native space. From the final atlas alignment, it was possible to calculate the volume of each lesion class per brain region. This pipeline was validated on a multi-centre dataset (n=839 scans), and defined three methods to flag any scans that presented sub-optimal results. The first one was based on the similarity metric of the registration of every scan to the study-specific CT template, the second aimed to identify any scans with regions that were completely collapsed post registration, and the final one identified scans with a significant volume of intra-ventricular haemorrhage outside of the ventricles. Additionally, an assessment of lesion prevalence and of the false negative and false positive rates of the algorithm, per anatomical region, was conducted, along with a bias assessment of the BLAST-CT tool. Our results show that the constructed pipeline is able to successfully localise TBI lesions across the whole brain, although without voxel-wise accuracy. We found the error rates calculated for each brain region to be inversely correlated with the lesion volume within that region. No considerable bias was identified on BLAST-CT, as all the significant correlation coefficients calculated between the Dice scores and clinical variables (i.e., age, Glasgow Coma Scale, Extended Glasgow Outcome Scale and Injury Severity Score) were below 0.2. Our results also suggest that the variation in DSC between male and female patients within a specific age range was caused by the discrepancy in lesion volume presented by the scans included in each sample

    Learning deep embeddings by learning to rank

    Full text link
    We study the problem of embedding high-dimensional visual data into low-dimensional vector representations. This is an important component in many computer vision applications involving nearest neighbor retrieval, as embedding techniques not only perform dimensionality reduction, but can also capture task-specific semantic similarities. In this thesis, we use deep neural networks to learn vector embeddings, and develop a gradient-based optimization framework that is capable of optimizing ranking-based retrieval performance metrics, such as the widely used Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Our framework is applied in three applications. First, we study Supervised Hashing, which is concerned with learning compact binary vector embeddings for fast retrieval, and propose two novel solutions. The first solution optimizes Mutual Information as a surrogate ranking objective, while the other directly optimizes AP and NDCG, based on the discovery of their closed-form expressions for discrete Hamming distances. These optimization problems are NP-hard, therefore we derive their continuous relaxations to enable gradient-based optimization with neural networks. Our solutions establish the state-of-the-art on several image retrieval benchmarks. Next, we learn deep neural networks to extract Local Feature Descriptors from image patches. Local features are used universally in low-level computer vision tasks that involve sparse feature matching, such as image registration and 3D reconstruction, and their matching is a nearest neighbor retrieval problem. We leverage our AP optimization technique to learn both binary and real-valued descriptors for local image patches. Compared to competing approaches, our solution eliminates complex heuristics, and performs more accurately in the tasks of patch verification, patch retrieval, and image matching. Lastly, we tackle Deep Metric Learning, the general problem of learning real-valued vector embeddings using deep neural networks. We propose a learning to rank solution through optimizing a novel quantization-based approximation of AP. For downstream tasks such as retrieval and clustering, we demonstrate promising results on standard benchmarks, especially in the few-shot learning scenario, where the number of labeled examples per class is limited

    Multimodal Data Fusion: An Overview of Methods, Challenges and Prospects

    No full text
    International audienceIn various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the term "modality" for each such acquisition framework. Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or "challenges" , are common to multiple domains. This paper deals with two key questions: "why we need data fusion" and "how we perform it". The first question is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second question, "diversity" is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the datasets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects and opportunities that it holds

    Deep Learning Strategies for Pool Boiling Heat Flux Prediction Using Image Sequences

    Get PDF
    The understanding of bubble dynamics during boiling is critical to the design of advanced heater surfaces to improve the boiling heat transfer. The stochastic bubble nucleation, growth, and coalescence processes have made it challenging to obtain mechanistic models that can predict boiling heat flux based on the bubble dynamics. Traditional boiling image analysis relies on the extraction of the dominant physical quantities from the images and is thus limited to the existing knowledge of these quantities. Recently, machine-learning-aided analysis has shown success in boiling crisis detection, heat flux prediction, real-time image analysis, etc., whereas most of the existing studies are focused on static boiling images, failing to capture the dynamic behaviors of the bubbles. To address this issue, in the present work, a convolutional long short-term memory (ConvLSTM) model is developed to enable quantitative prediction of heat flux based on sequences of boiling images, where the convolutional layers are used to extract the features of the boiling images and the LSTM layers to identify the temporal features of the sequences. A convolutional neural network (CNN) model that is based on the classification of static images is also developed as a reference. Both models are trained with images of HFE-7100 boiling on silicon micropillar arrays at different steady-state heat fluxes. The results show that both CNN and ConvLSTM models have led to accurate predictions of heat flux based on the boiling images. In particular, the ConvLSTM model is shown to yield higher accuracy for heat flux predictions of completely unseen data, indicating a higher level of generality. Another focus of the present work is the forecasting capability of data-driven models using boiling images under transient heat loads. A CNN regression model is coupled with a one-dimensional LSTM model to enable a quantitative forecast of heat flux during boiling. The model is trained using image sequences of water boiling on planar copper surfaces with power ramp-up and has demonstrated a reliable forecasting capability

    Human-controllable and structured deep generative models

    Get PDF
    Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.Open Acces

    Change detection in optical aerial images by a multilayer conditional mixed Markov model

    Get PDF
    In this paper we propose a probabilistic model for detecting relevant changes in registered aerial image pairs taken with the time differences of several years and in different seasonal conditions. The introduced approach, called the Conditional Mixed Markov model (CXM), is a combination of a mixed Markov model and a conditionally independent random field of signals. The model integrates global intensity statistics with local correlation and contrast features. A global energy optimization process ensures simultaneously optimal local feature selection and smooth, observation-consistent segmentation. Validation is given on real aerial image sets provided by the Hungarian Institute of Geodesy, Cartography and Remote Sensing and Google Earth
    • …
    corecore