8 research outputs found
Max-Fusion U-Net for Multi-Modal Pathology Segmentation with Attention and Dynamic Resampling
Automatic segmentation of multi-sequence (multi-modal) cardiac MR (CMR)
images plays a significant role in diagnosis and management for a variety of
cardiac diseases. However, the performance of relevant algorithms is
significantly affected by the proper fusion of the multi-modal information.
Furthermore, particular diseases, such as myocardial infarction, display
irregular shapes on images and occupy small regions at random locations. These
facts make pathology segmentation of multi-modal CMR images a challenging task.
In this paper, we present the Max-Fusion U-Net that achieves improved pathology
segmentation performance given aligned multi-modal images of LGE, T2-weighted,
and bSSFP modalities. Specifically, modality-specific features are extracted by
dedicated encoders. Then they are fused with the pixel-wise maximum operator.
Together with the corresponding encoding features, these representations are
propagated to decoding layers with U-Net skip-connections. Furthermore, a
spatial-attention module is applied in the last decoding layer to encourage the
network to focus on those small semantically meaningful pathological regions
that trigger relatively high responses by the network neurons. We also use a
simple image patch extraction strategy to dynamically resample training
examples with varying spacial and batch sizes. With limited GPU memory, this
strategy reduces the imbalance of classes and forces the model to focus on
regions around the interested pathology. It further improves segmentation
accuracy and reduces the mis-classification of pathology. We evaluate our
methods using the Myocardial pathology segmentation (MyoPS) combining the
multi-sequence CMR dataset which involves three modalities. Extensive
experiments demonstrate the effectiveness of the proposed model which
outperforms the related baselines.Comment: 13 pages, 7 figures, conference pape
Joint and individual analysis of breast cancer histologic images and genomic covariates
A key challenge in modern data analysis is understanding connections between
complex and differing modalities of data. For example, two of the main
approaches to the study of breast cancer are histopathology (analyzing visual
characteristics of tumors) and genetics. While histopathology is the gold
standard for diagnostics and there have been many recent breakthroughs in
genetics, there is little overlap between these two fields. We aim to bridge
this gap by developing methods based on Angle-based Joint and Individual
Variation Explained (AJIVE) to directly explore similarities and differences
between these two modalities. Our approach exploits Convolutional Neural
Networks (CNNs) as a powerful, automatic method for image feature extraction to
address some of the challenges presented by statistical analysis of
histopathology image data. CNNs raise issues of interpretability that we
address by developing novel methods to explore visual modes of variation
captured by statistical algorithms (e.g. PCA or AJIVE) applied to CNN features.
Our results provide many interpretable connections and contrasts between
histopathology and genetics
Spatio-temporal classification for polyp diagnosis
Colonoscopy remains the gold standard investigation for colorectal cancer screening as it offers the opportunity to both detect and resect pre-cancerous polyps. Computer-aided polyp characterisation can determine which polyps need polypectomy and recent deep learning-based approaches have shown promising results as clinical decision support tools. Yet polyp appearance during a procedure can vary, making automatic predictions unstable. In this paper, we investigate the use of spatio-temporal information to improve the performance of lesions classification as adenoma or non-adenoma. Two methods are implemented showing an increase in performance and robustness during extensive experiments both on internal and openly available benchmark datasets
Deep-Learning-basierte semantische Segmentierung von Indoor-RGBD-Szenen für den Einsatz auf einem mobilen Roboter
Eine pixelgenaue semantische Segmentierung bildet die Grundlage für ein umfassendes Szenenverständnis. Semantisches Wissen über die Struktur und den Aufbau von Indoor-Szenen kann mobilen Robotern bei verschiedenen Aufgaben nützlich sein. Unter Anderem kann dadurch die Lokalisierung, die Hindernisvermeidung, die gezielte Navigation zu semantischen Entitäten oder die Mensch-Maschine-Interaktion unterstützt werden. Durch den Einsatz von effizienten RGB-Verfahren konnten zuletzt bereits gute Segmentierungsergebnisse erzielt werden. Bei zusätzlicher Berücksichtigung von Tiefendaten kann die Segmentierungsleistung in der Regel noch weiter verbessert werden. In dieser Masterarbeit werden daher Verfahren zur effizienten semantischen Segmentierung und zur RGBD-Segmentierung kombiniert. Auf Basis einer breiten Recherche zu beiden Themengebieten wird ein eigener, effizienter Deep-Learning-basierter RGBD-Segmentierungsansatz entwickelt. Mittels ausführlicher Experimente zu verschiedenen Bestandteilen der Netzwerkarchitektur wird gezeigt, wie die Segmentierungsleistung Schritt für Schritt erhöht werden kann. Neben der Segmentierungsleistung wird dabei stets auf eine geringe Inferenzzeit geachtet. Das beste, in dieser Masterarbeit entwickelte, Netzwerk erzielt auf dem einschlägigen Indoor-RGBD-Datensatz SUN RGB-D mit einer mean Intersection over Union (mIoU) von 47.62 vergleichbare Ergebnisse zum State of the Art. Dennoch ist die Verarbeitungsfrequenz mit 13.2 Frames pro Sekunde auf einem NVIDIA Jetson AGX Xavier deutlich höher und ermöglicht somit den Einsatz auf einem mobilen Roboter.Pixel accurate semantic segmentation lays the foundation for comprehensive scene understanding. Semantic knowledge about the structure and the setup of indoor scenes may support mobile robots in various tasks, such as localization, obstacle avoidance, targeted navigation to semantic entities, or human-machine interaction. Recently, precise segmentations have been achieved utilizing efficient RGB methods solely. However, incorporating depth images as well can further improve segmentation performance. Therefore, in this master thesis, methods for both efficient semantic segmentation and RGBD segmentation are examined. Based on a broad literature research on both topics, a novel efficient deep learning-based RGBD segmentation approach is derived. With comprehensive experiments to various parts of the network architecture, the segmentation performance is improved step by step. Besides the segmentation performance, low inference time is of great importance for mobile applications. The best network achieves a comparable mean Intersection over Union (mIoU) of 47.62 to the state of the art on the relevant indoor RGBD segmentation dataset SUN RGB-D, while enabling a significantly higher frame rate of 13.2 frames per second on a NVIDA Jetson AGX Xavier and, thus, is well suited for usage on mobile robots
Multimodal and disentangled representation learning for medical image analysis
Automated medical image analysis is a growing research field with various applications in
modern healthcare. Furthermore, a multitude of imaging techniques (or modalities) have been
developed, such as Magnetic Resonance (MR) and Computed Tomography (CT), to attenuate
different organ characteristics. Research on image analysis is predominately driven by deep
learning methods due to their demonstrated performance. In this thesis, we argue that their success and generalisation relies on learning good latent representations. We propose methods for
learning spatial representations that are suitable for medical image data, and can combine information coming from different modalities. Specifically, we aim to improve cardiac MR segmentation, a challenging task due to varied images and limited expert annotations, by considering
complementary information present in (potentially unaligned) images of other modalities.
In order to evaluate the benefit of multimodal learning, we initially consider a synthesis task
on spatially aligned multimodal brain MR images. We propose a deep network of multiple
encoders and decoders, which we demonstrate outperforms existing approaches. The encoders
(one per input modality) map the multimodal images into modality invariant spatial feature
maps. Common and unique information is combined into a fused representation, that is robust
to missing modalities, and can be decoded into synthetic images of the target modalities. Different experimental settings demonstrate the benefit of multimodal over unimodal synthesis,
although input and output image pairs are required for training. The need for paired images can
be overcome with the cycle consistency principle, which we use in conjunction with adversarial
training to transform images from one modality (e.g. MR) to images in another (e.g. CT). This
is useful especially in cardiac datasets, where different spatial and temporal resolutions make
image pairing difficult, if not impossible.
Segmentation can also be considered as a form of image synthesis, if one modality consists of
semantic maps. We consider the task of extracting segmentation masks for cardiac MR images,
and aim to overcome the challenge of limited annotations, by taking into account unannanotated images which are commonly ignored. We achieve this by defining suitable latent spaces,
which represent the underlying anatomies (spatial latent variable), as well as the imaging characteristics (non-spatial latent variable). Anatomical information is required for tasks such as
segmentation and regression, whereas imaging information can capture variability in intensity
characteristics for example due to different scanners. We propose two models that disentangle
cardiac images at different levels: the first extracts the myocardium from the surrounding information, whereas the second fully separates the anatomical from the imaging characteristics.
Experimental analysis confirms the utility of disentangled representations in semi-supervised
segmentation, and in regression of cardiac indices, while maintaining robustness to intensity
variations such as the ones induced by different modalities.
Finally, our prior research is aggregated into one framework that encodes multimodal images
into disentangled anatomical and imaging factors. Several challenges of multimodal cardiac
imaging, such as input misalignments and the lack of expert annotations, are successfully handled in the shared anatomy space. Furthermore, we demonstrate that this approach can be used
to combine complementary anatomical information for the purpose of multimodal segmentation. This can be achieved even when no annotations are provided for one of the modalities.
This thesis creates new avenues for further research in the area of multimodal and disentangled learning with spatial representations, which we believe are key to more generalised deep
learning solutions in healthcare