8 research outputs found

    Max-Fusion U-Net for Multi-Modal Pathology Segmentation with Attention and Dynamic Resampling

    Get PDF
    Automatic segmentation of multi-sequence (multi-modal) cardiac MR (CMR) images plays a significant role in diagnosis and management for a variety of cardiac diseases. However, the performance of relevant algorithms is significantly affected by the proper fusion of the multi-modal information. Furthermore, particular diseases, such as myocardial infarction, display irregular shapes on images and occupy small regions at random locations. These facts make pathology segmentation of multi-modal CMR images a challenging task. In this paper, we present the Max-Fusion U-Net that achieves improved pathology segmentation performance given aligned multi-modal images of LGE, T2-weighted, and bSSFP modalities. Specifically, modality-specific features are extracted by dedicated encoders. Then they are fused with the pixel-wise maximum operator. Together with the corresponding encoding features, these representations are propagated to decoding layers with U-Net skip-connections. Furthermore, a spatial-attention module is applied in the last decoding layer to encourage the network to focus on those small semantically meaningful pathological regions that trigger relatively high responses by the network neurons. We also use a simple image patch extraction strategy to dynamically resample training examples with varying spacial and batch sizes. With limited GPU memory, this strategy reduces the imbalance of classes and forces the model to focus on regions around the interested pathology. It further improves segmentation accuracy and reduces the mis-classification of pathology. We evaluate our methods using the Myocardial pathology segmentation (MyoPS) combining the multi-sequence CMR dataset which involves three modalities. Extensive experiments demonstrate the effectiveness of the proposed model which outperforms the related baselines.Comment: 13 pages, 7 figures, conference pape

    Joint and individual analysis of breast cancer histologic images and genomic covariates

    Get PDF
    A key challenge in modern data analysis is understanding connections between complex and differing modalities of data. For example, two of the main approaches to the study of breast cancer are histopathology (analyzing visual characteristics of tumors) and genetics. While histopathology is the gold standard for diagnostics and there have been many recent breakthroughs in genetics, there is little overlap between these two fields. We aim to bridge this gap by developing methods based on Angle-based Joint and Individual Variation Explained (AJIVE) to directly explore similarities and differences between these two modalities. Our approach exploits Convolutional Neural Networks (CNNs) as a powerful, automatic method for image feature extraction to address some of the challenges presented by statistical analysis of histopathology image data. CNNs raise issues of interpretability that we address by developing novel methods to explore visual modes of variation captured by statistical algorithms (e.g. PCA or AJIVE) applied to CNN features. Our results provide many interpretable connections and contrasts between histopathology and genetics

    Spatio-temporal classification for polyp diagnosis

    Get PDF
    Colonoscopy remains the gold standard investigation for colorectal cancer screening as it offers the opportunity to both detect and resect pre-cancerous polyps. Computer-aided polyp characterisation can determine which polyps need polypectomy and recent deep learning-based approaches have shown promising results as clinical decision support tools. Yet polyp appearance during a procedure can vary, making automatic predictions unstable. In this paper, we investigate the use of spatio-temporal information to improve the performance of lesions classification as adenoma or non-adenoma. Two methods are implemented showing an increase in performance and robustness during extensive experiments both on internal and openly available benchmark datasets

    Deep-Learning-basierte semantische Segmentierung von Indoor-RGBD-Szenen für den Einsatz auf einem mobilen Roboter

    Get PDF
    Eine pixelgenaue semantische Segmentierung bildet die Grundlage für ein umfassendes Szenenverständnis. Semantisches Wissen über die Struktur und den Aufbau von Indoor-Szenen kann mobilen Robotern bei verschiedenen Aufgaben nützlich sein. Unter Anderem kann dadurch die Lokalisierung, die Hindernisvermeidung, die gezielte Navigation zu semantischen Entitäten oder die Mensch-Maschine-Interaktion unterstützt werden. Durch den Einsatz von effizienten RGB-Verfahren konnten zuletzt bereits gute Segmentierungsergebnisse erzielt werden. Bei zusätzlicher Berücksichtigung von Tiefendaten kann die Segmentierungsleistung in der Regel noch weiter verbessert werden. In dieser Masterarbeit werden daher Verfahren zur effizienten semantischen Segmentierung und zur RGBD-Segmentierung kombiniert. Auf Basis einer breiten Recherche zu beiden Themengebieten wird ein eigener, effizienter Deep-Learning-basierter RGBD-Segmentierungsansatz entwickelt. Mittels ausführlicher Experimente zu verschiedenen Bestandteilen der Netzwerkarchitektur wird gezeigt, wie die Segmentierungsleistung Schritt für Schritt erhöht werden kann. Neben der Segmentierungsleistung wird dabei stets auf eine geringe Inferenzzeit geachtet. Das beste, in dieser Masterarbeit entwickelte, Netzwerk erzielt auf dem einschlägigen Indoor-RGBD-Datensatz SUN RGB-D mit einer mean Intersection over Union (mIoU) von 47.62 vergleichbare Ergebnisse zum State of the Art. Dennoch ist die Verarbeitungsfrequenz mit 13.2 Frames pro Sekunde auf einem NVIDIA Jetson AGX Xavier deutlich höher und ermöglicht somit den Einsatz auf einem mobilen Roboter.Pixel accurate semantic segmentation lays the foundation for comprehensive scene understanding. Semantic knowledge about the structure and the setup of indoor scenes may support mobile robots in various tasks, such as localization, obstacle avoidance, targeted navigation to semantic entities, or human-machine interaction. Recently, precise segmentations have been achieved utilizing efficient RGB methods solely. However, incorporating depth images as well can further improve segmentation performance. Therefore, in this master thesis, methods for both efficient semantic segmentation and RGBD segmentation are examined. Based on a broad literature research on both topics, a novel efficient deep learning-based RGBD segmentation approach is derived. With comprehensive experiments to various parts of the network architecture, the segmentation performance is improved step by step. Besides the segmentation performance, low inference time is of great importance for mobile applications. The best network achieves a comparable mean Intersection over Union (mIoU) of 47.62 to the state of the art on the relevant indoor RGBD segmentation dataset SUN RGB-D, while enabling a significantly higher frame rate of 13.2 frames per second on a NVIDA Jetson AGX Xavier and, thus, is well suited for usage on mobile robots

    Multimodal and disentangled representation learning for medical image analysis

    Get PDF
    Automated medical image analysis is a growing research field with various applications in modern healthcare. Furthermore, a multitude of imaging techniques (or modalities) have been developed, such as Magnetic Resonance (MR) and Computed Tomography (CT), to attenuate different organ characteristics. Research on image analysis is predominately driven by deep learning methods due to their demonstrated performance. In this thesis, we argue that their success and generalisation relies on learning good latent representations. We propose methods for learning spatial representations that are suitable for medical image data, and can combine information coming from different modalities. Specifically, we aim to improve cardiac MR segmentation, a challenging task due to varied images and limited expert annotations, by considering complementary information present in (potentially unaligned) images of other modalities. In order to evaluate the benefit of multimodal learning, we initially consider a synthesis task on spatially aligned multimodal brain MR images. We propose a deep network of multiple encoders and decoders, which we demonstrate outperforms existing approaches. The encoders (one per input modality) map the multimodal images into modality invariant spatial feature maps. Common and unique information is combined into a fused representation, that is robust to missing modalities, and can be decoded into synthetic images of the target modalities. Different experimental settings demonstrate the benefit of multimodal over unimodal synthesis, although input and output image pairs are required for training. The need for paired images can be overcome with the cycle consistency principle, which we use in conjunction with adversarial training to transform images from one modality (e.g. MR) to images in another (e.g. CT). This is useful especially in cardiac datasets, where different spatial and temporal resolutions make image pairing difficult, if not impossible. Segmentation can also be considered as a form of image synthesis, if one modality consists of semantic maps. We consider the task of extracting segmentation masks for cardiac MR images, and aim to overcome the challenge of limited annotations, by taking into account unannanotated images which are commonly ignored. We achieve this by defining suitable latent spaces, which represent the underlying anatomies (spatial latent variable), as well as the imaging characteristics (non-spatial latent variable). Anatomical information is required for tasks such as segmentation and regression, whereas imaging information can capture variability in intensity characteristics for example due to different scanners. We propose two models that disentangle cardiac images at different levels: the first extracts the myocardium from the surrounding information, whereas the second fully separates the anatomical from the imaging characteristics. Experimental analysis confirms the utility of disentangled representations in semi-supervised segmentation, and in regression of cardiac indices, while maintaining robustness to intensity variations such as the ones induced by different modalities. Finally, our prior research is aggregated into one framework that encodes multimodal images into disentangled anatomical and imaging factors. Several challenges of multimodal cardiac imaging, such as input misalignments and the lack of expert annotations, are successfully handled in the shared anatomy space. Furthermore, we demonstrate that this approach can be used to combine complementary anatomical information for the purpose of multimodal segmentation. This can be achieved even when no annotations are provided for one of the modalities. This thesis creates new avenues for further research in the area of multimodal and disentangled learning with spatial representations, which we believe are key to more generalised deep learning solutions in healthcare
    corecore