376 research outputs found
Explainable artificial intelligence (XAI) in deep learning-based medical image analysis
With an increase in deep learning-based methods, the call for explainability
of such methods grows, especially in high-stakes decision making areas such as
medical image analysis. This survey presents an overview of eXplainable
Artificial Intelligence (XAI) used in deep learning-based medical image
analysis. A framework of XAI criteria is introduced to classify deep
learning-based medical image analysis methods. Papers on XAI techniques in
medical image analysis are then surveyed and categorized according to the
framework and according to anatomical location. The paper concludes with an
outlook of future opportunities for XAI in medical image analysis.Comment: Submitted for publication. Comments welcome by email to first autho
Deep Learning in Cardiology
The medical field is creating large amount of data that physicians are unable
to decipher and use efficiently. Moreover, rule-based expert systems are
inefficient in solving complicated medical tasks or for creating insights using
big data. Deep learning has emerged as a more accurate and effective technology
in a wide range of medical problems such as diagnosis, prediction and
intervention. Deep learning is a representation learning method that consists
of layers that transform the data non-linearly, thus, revealing hierarchical
relationships and structures. In this review we survey deep learning
application papers that use structured data, signal and imaging modalities from
cardiology. We discuss the advantages and limitations of applying deep learning
in cardiology that also apply in medicine in general, while proposing certain
directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table
MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks
Predicting multiple real-world tasks in a single model often requires a
particularly diverse feature space. Multimodal (MM) models aim to extract the
synergistic predictive potential of multiple data types to create a shared
feature space with aligned semantic meaning across inputs of drastically
varying sizes (i.e. images, text, sound). Most current MM architectures fuse
these representations in parallel, which not only limits their interpretability
but also creates a dependency on modality availability. We present MultiModN, a
multimodal, modular network that fuses latent representations in a sequence of
any number, combination, or type of modality while providing granular real-time
predictive feedback on any number or combination of predictive tasks.
MultiModN's composable pipeline is interpretable-by-design, as well as innately
multi-task and robust to the fundamental issue of biased missingness. We
perform four experiments on several benchmark MM datasets across 10 real-world
tasks (predicting medical diagnoses, academic performance, and weather), and
show that MultiModN's sequential MM fusion does not compromise performance
compared with a baseline of parallel fusion. By simulating the challenging bias
of missing not-at-random (MNAR), this work shows that, contrary to MultiModN,
parallel fusion baselines erroneously learn MNAR and suffer catastrophic
failure when faced with different patterns of MNAR at inference. To the best of
our knowledge, this is the first inherently MNAR-resistant approach to MM
modeling. In conclusion, MultiModN provides granular insights, robustness, and
flexibility without compromising performance.Comment: Accepted as a full paper at NeurIPS 2023 in New Orleans, US
Transformer-based interpretable multi-modal data fusion for skin lesion classification
A lot of deep learning (DL) research these days is mainly focused on
improving on quantitative metrics regardless of other factors. In human
centered applications, like skin lesion classification in dermatology,
DL-driven clinical decision support systems are still in their infancy due to
the limited transparency of their decision-making process. Moreover, the lack
of procedures that can explain the behavior of trained DL algorithms leads to
almost no trust from the clinical physicians. To diagnose skin lesions,
dermatologists rely on both visual assessment of the disease and the data
gathered from the anamnesis of the patient. Data-driven algorithms dealing with
multi-modal data are limited by the separation of feature-level and
decision-level fusion procedures required by convolutional architectures. To
address this issue, we enable single-stage multi-modal data fusion via the
attention mechanism of transformer-based architectures to aid in the diagnosis
of skin diseases. Our method beats other state-of-the-art single- and
multi-modal DL architectures in both image rich and patient-data rich
environments. Additionally, the choice of the architecture enables native
interpretability support for the classification task both in image and metadata
domain with no additional modifications necessary.Comment: Submitted to IEEE TMI in March 202
Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine
Artificial intelligence (AI) continues to transform data analysis in many
domains. Progress in each domain is driven by a growing body of annotated data,
increased computational resources, and technological innovations. In medicine,
the sensitivity of the data, the complexity of the tasks, the potentially high
stakes, and a requirement of accountability give rise to a particular set of
challenges. In this review, we focus on three key methodological approaches
that address some of the particular challenges in AI-driven medical decision
making. (1) Explainable AI aims to produce a human-interpretable justification
for each output. Such models increase confidence if the results appear
plausible and match the clinicians expectations. However, the absence of a
plausible explanation does not imply an inaccurate model. Especially in highly
non-linear, complex models that are tuned to maximize accuracy, such
interpretable representations only reflect a small portion of the
justification. (2) Domain adaptation and transfer learning enable AI models to
be trained and applied across multiple domains. For example, a classification
task based on images acquired on different acquisition hardware. (3) Federated
learning enables learning large-scale models without exposing sensitive
personal health information. Unlike centralized AI learning, where the
centralized learning machine has access to the entire training data, the
federated learning process iteratively updates models across multiple sites by
exchanging only parameter updates, not personal health data. This narrative
review covers the basic concepts, highlights relevant corner-stone and
state-of-the-art research in the field, and discusses perspectives.Comment: This paper is accepted in IEEE CAA Journal of Automatica Sinica, Nov.
10 202
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Machine Learning for Multiclass Classification and Prediction of Alzheimer\u27s Disease
Alzheimer\u27s disease (AD) is an irreversible neurodegenerative disorder and a common form of dementia. This research aims to develop machine learning algorithms that diagnose and predict the progression of AD from multimodal heterogonous biomarkers with a focus placed on the early diagnosis. To meet this goal, several machine learning-based methods with their unique characteristics for feature extraction and automated classification, prediction, and visualization have been developed to discern subtle progression trends and predict the trajectory of disease progression.
The methodology envisioned aims to enhance both the multiclass classification accuracy and prediction outcomes by effectively modeling the interplay between the multimodal biomarkers, handle the missing data challenge, and adequately extract all the relevant features that will be fed into the machine learning framework, all in order to understand the subtle changes that happen in the different stages of the disease. This research will also investigate the notion of multitasking to discover how the two processes of multiclass classification and prediction relate to one another in terms of the features they share and whether they could learn from one another for optimizing multiclass classification and prediction accuracy.
This research work also delves into predicting cognitive scores of specific tests over time, using multimodal longitudinal data. The intent is to augment our prospects for analyzing the interplay between the different multimodal features used in the input space to the predicted cognitive scores. Moreover, the power of modality fusion, kernelization, and tensorization have also been investigated to efficiently extract important features hidden in the lower-dimensional feature space without being distracted by those deemed as irrelevant.
With the adage that a picture is worth a thousand words, this dissertation introduces a unique color-coded visualization system with a fully integrated machine learning model for the enhanced diagnosis and prognosis of Alzheimer\u27s disease. The incentive here is to show that through visualization, the challenges imposed by both the variability and interrelatedness of the multimodal features could be overcome. Ultimately, this form of visualization via machine learning informs on the challenges faced with multiclass classification and adds insight into the decision-making process for a diagnosis and prognosis
- …