54 research outputs found
LESS: Label-efficient Multi-scale Learning for Cytological Whole Slide Image Screening
In computational pathology, multiple instance learning (MIL) is widely used
to circumvent the computational impasse in giga-pixel whole slide image (WSI)
analysis. It usually consists of two stages: patch-level feature extraction and
slide-level aggregation. Recently, pretrained models or self-supervised
learning have been used to extract patch features, but they suffer from low
effectiveness or inefficiency due to overlooking the task-specific supervision
provided by slide labels. Here we propose a weakly-supervised Label-Efficient
WSI Screening method, dubbed LESS, for cytological WSI analysis with only
slide-level labels, which can be effectively applied to small datasets. First,
we suggest using variational positive-unlabeled (VPU) learning to uncover
hidden labels of both benign and malignant patches. We provide appropriate
supervision by using slide-level labels to improve the learning of patch-level
features. Next, we take into account the sparse and random arrangement of cells
in cytological WSIs. To address this, we propose a strategy to crop patches at
multiple scales and utilize a cross-attention vision transformer (CrossViT) to
combine information from different scales for WSI classification. The
combination of our two steps achieves task-alignment, improving effectiveness
and efficiency. We validate the proposed label-efficient method on a urine
cytology WSI dataset encompassing 130 samples (13,000 patches) and FNAC 2019
dataset with 212 samples (21,200 patches). The experiment shows that the
proposed LESS reaches 84.79%, 85.43%, 91.79% and 78.30% on a urine cytology WSI
dataset, and 96.88%, 96.86%, 98.95%, 97.06% on FNAC 2019 dataset in terms of
accuracy, AUC, sensitivity and specificity. It outperforms state-of-the-art MIL
methods on pathology WSIs and realizes automatic cytological WSI cancer
screening.Comment: This paper was submitted to Medical Image Analysis. It is under
revie
Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives
Deep learning has demonstrated remarkable performance across various tasks in
medical imaging. However, these approaches primarily focus on supervised
learning, assuming that the training and testing data are drawn from the same
distribution. Unfortunately, this assumption may not always hold true in
practice. To address these issues, unsupervised domain adaptation (UDA)
techniques have been developed to transfer knowledge from a labeled domain to a
related but unlabeled domain. In recent years, significant advancements have
been made in UDA, resulting in a wide range of methodologies, including feature
alignment, image translation, self-supervision, and disentangled representation
methods, among others. In this paper, we provide a comprehensive literature
review of recent deep UDA approaches in medical imaging from a technical
perspective. Specifically, we categorize current UDA research in medical
imaging into six groups and further divide them into finer subcategories based
on the different tasks they perform. We also discuss the respective datasets
used in the studies to assess the divergence between the different domains.
Finally, we discuss emerging areas and provide insights and discussions on
future research directions to conclude this survey.Comment: Under Revie
Multimodal Data Fusion and Quantitative Analysis for Medical Applications
Medical big data is not only enormous in its size, but also heterogeneous and complex in its data structure, which makes conventional systems or algorithms difficult to process. These heterogeneous medical data include imaging data (e.g., Positron Emission Tomography (PET), Computerized Tomography (CT), Magnetic Resonance Imaging (MRI)), and non-imaging data (e.g., laboratory biomarkers, electronic medical records, and hand-written doctor notes). Multimodal data fusion is an emerging vital field to address this urgent challenge, aiming to process and analyze the complex, diverse and heterogeneous multimodal data. The fusion algorithms bring great potential in medical data analysis, by 1) taking advantage of complementary information from different sources (such as functional-structural complementarity of PET/CT images) and 2) exploiting consensus information that reflects the intrinsic essence (such as the genetic essence underlying medical imaging and clinical symptoms). Thus, multimodal data fusion benefits a wide range of quantitative medical applications, including personalized patient care, more optimal medical operation plan, and preventive public health.
Though there has been extensive research on computational approaches for multimodal fusion, there are three major challenges of multimodal data fusion in quantitative medical applications, which are summarized as feature-level fusion, information-level fusion and knowledge-level fusion:
• Feature-level fusion. The first challenge is to mine multimodal biomarkers from high-dimensional small-sample multimodal medical datasets, which hinders the effective discovery of informative multimodal biomarkers. Specifically, efficient dimension reduction algorithms are required to alleviate "curse of dimensionality" problem and address the criteria for discovering interpretable, relevant, non-redundant and generalizable multimodal biomarkers.
• Information-level fusion. The second challenge is to exploit and interpret inter-modal and intra-modal information for precise clinical decisions. Although radiomics and multi-branch deep learning have been used for implicit information fusion guided with supervision of the labels, there is a lack of methods to explicitly explore inter-modal relationships in medical applications. Unsupervised multimodal learning is able to mine inter-modal relationship as well as reduce the usage of labor-intensive data and explore potential undiscovered biomarkers; however, mining discriminative information without label supervision is an upcoming challenge. Furthermore, the interpretation of complex non-linear cross-modal associations, especially in deep multimodal learning, is another critical challenge in information-level fusion, which hinders the exploration of multimodal interaction in disease mechanism.
• Knowledge-level fusion. The third challenge is quantitative knowledge distillation from multi-focus regions on medical imaging. Although characterizing imaging features from single lesions using either feature engineering or deep learning methods have been investigated in recent years, both methods neglect the importance of inter-region spatial relationships. Thus, a topological profiling tool for multi-focus regions is in high demand, which is yet missing in current feature engineering and deep learning methods. Furthermore, incorporating domain knowledge with distilled knowledge from multi-focus regions is another challenge in knowledge-level fusion.
To address the three challenges in multimodal data fusion, this thesis provides a multi-level fusion framework for multimodal biomarker mining, multimodal deep learning, and knowledge distillation from multi-focus regions. Specifically, our major contributions in this thesis include:
• To address the challenges in feature-level fusion, we propose an Integrative Multimodal Biomarker Mining framework to select interpretable, relevant, non-redundant and generalizable multimodal biomarkers from high-dimensional small-sample imaging and non-imaging data for diagnostic and prognostic applications. The feature selection criteria including representativeness, robustness, discriminability, and non-redundancy are exploited by consensus clustering, Wilcoxon filter, sequential forward selection, and correlation analysis, respectively. SHapley Additive exPlanations (SHAP) method and nomogram are employed to further enhance feature interpretability in machine learning models.
• To address the challenges in information-level fusion, we propose an Interpretable Deep Correlational Fusion framework, based on canonical correlation analysis (CCA) for 1) cohesive multimodal fusion of medical imaging and non-imaging data, and 2) interpretation of complex non-linear cross-modal associations. Specifically, two novel loss functions are proposed to optimize the discovery of informative multimodal representations in both supervised and unsupervised deep learning, by jointly learning inter-modal consensus and intra-modal discriminative information. An interpretation module is proposed to decipher the complex non-linear cross-modal association by leveraging interpretation methods in both deep learning and multimodal consensus learning.
• To address the challenges in knowledge-level fusion, we proposed a Dynamic Topological Analysis framework, based on persistent homology, for knowledge distillation from inter-connected multi-focus regions in medical imaging and incorporation of domain knowledge. Different from conventional feature engineering and deep learning, our DTA framework is able to explicitly quantify inter-region topological relationships, including global-level geometric structure and community-level clusters. K-simplex Community Graph is proposed to construct the dynamic community graph for representing community-level multi-scale graph structure. The constructed dynamic graph is subsequently tracked with a novel Decomposed Persistence algorithm. Domain knowledge is incorporated into the Adaptive Community Profile, summarizing the tracked multi-scale community topology with additional customizable clinically important factors
Producing Decisions and Explanations: A Joint Approach Towards Explainable CNNs
Deep Learning models, in particular Convolutional Neural Networks, have become the state-of-the-art in different domains, such as image classification, object detection and other computer vision tasks. However, despite their overwhelming predictive performance, they are still, for the most part, considered black-boxes, making it difficult to understand the reasoning behind their outputted decisions. As such, and with the growing interest in deploying such models into real world scenarios, the need for explainable systems has arisen. Therefore, this dissertation tries to mitigate this growing need, by proposing a novel CNN architecture, composed of an explainer and a classifier. The network, trained end-to-end, constitutes an in-model explainability method, that not only outputs decisions as well as visual explanations of what the network is focusing on to produce such decisions
Medical Image Segmentation Review: The success of U-Net
Automatic medical image segmentation is a crucial topic in the medical domain
and successively a critical counterpart in the computer-aided diagnosis
paradigm. U-Net is the most widespread image segmentation architecture due to
its flexibility, optimized modular design, and success in all medical image
modalities. Over the years, the U-Net model achieved tremendous attention from
academic and industrial researchers. Several extensions of this network have
been proposed to address the scale and complexity created by medical tasks.
Addressing the deficiency of the naive U-Net model is the foremost step for
vendors to utilize the proper U-Net variant model for their business. Having a
compendium of different variants in one place makes it easier for builders to
identify the relevant research. Also, for ML researchers it will help them
understand the challenges of the biological tasks that challenge the model. To
address this, we discuss the practical aspects of the U-Net model and suggest a
taxonomy to categorize each network variant. Moreover, to measure the
performance of these strategies in a clinical application, we propose fair
evaluations of some unique and famous designs on well-known datasets. We
provide a comprehensive implementation library with trained models for future
research. In addition, for ease of future studies, we created an online list of
U-Net papers with their possible official implementation. All information is
gathered in https://github.com/NITR098/Awesome-U-Net repository.Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine
Intelligence Journa
AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model
© 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution
Imaging Sensors and Applications
In past decades, various sensor technologies have been used in all areas of our lives, thus improving our quality of life. In particular, imaging sensors have been widely applied in the development of various imaging approaches such as optical imaging, ultrasound imaging, X-ray imaging, and nuclear imaging, and contributed to achieve high sensitivity, miniaturization, and real-time imaging. These advanced image sensing technologies play an important role not only in the medical field but also in the industrial field. This Special Issue covers broad topics on imaging sensors and applications. The scope range of imaging sensors can be extended to novel imaging sensors and diverse imaging systems, including hardware and software advancements. Additionally, biomedical and nondestructive sensing applications are welcome
Addressing subjectivity in the classification of palaeoenvironmental remains with supervised deep learning convolutional neural networks
Archaeological object identifications have been traditionally undertaken through a comparative methodology where each artefact is identified through a subjective, interpretative act by a professional. Regarding palaeoenvironmental remains, this comparative methodology is given boundaries by using reference materials and codified sets of rules, but subjectivity is nevertheless present. The problem with this traditional archaeological methodology is that higher level of subjectivity in the identification of artefacts leads to inaccuracies, which then increases the potential for Type I and Type II errors in the testing of hypotheses. Reducing the subjectivity of archaeological identifications would improve the statistical power of archaeological analyses, which would subsequently lead to more impactful research. In this thesis, it is shown that the level of subjectivity in palaeoenvironmental research can be reduced by applying deep learning convolutional neural networks within an image recognition framework. The primary aim of the presented research is therefore to further the on-going paradigm shift in archaeology towards model-based object identifications, particularly within the realm of palaeoenvironmental remains. Although this thesis focuses on the identification of pollen grains and animal bones, with the latter being restricted to the astragalus of sheep and goats, there are wider implications for archaeology as these methods can easily be extended beyond pollen and animal remains. The previously published POLEN23E dataset is used as the pilot study of applying deep learning in pollen grain classification. In contrast, an image dataset of modern bones was compiled for the classification of sheep and goat astragali due to a complete lack of available bone image datasets and a double blind study with inexperienced and experienced zooarchaeologists was performed to have a benchmark to which image recognition models can be compared. In both classification tasks, the presented models outperform all previous formal modelling methods and only the best human analysts match the performance of the deep learning model in the sheep and goat astragalus separation task. Throughout the thesis, there is a specific focus on increasing trust in the models through the visualization of the models’ decision making and avenues of improvements to Grad-CAM are explored. This thesis makes an explicit case for the phasing out of the comparative methods in favour of a formal modelling framework within archaeology, especially in palaeoenvironmental object identification
- …