27 research outputs found
On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets
Machine learning based methods for diagnosis and progression prediction of
COVID-19 from imaging data have gained significant attention in the last
months, in particular by the use of deep learning models. In this context
hundreds of models where proposed with the majority of them trained on public
datasets. Data scarcity, mismatch between training and target population, group
imbalance, and lack of documentation are important sources of bias, hindering
the applicability of these models to real-world clinical practice. Considering
that datasets are an essential part of model building and evaluation, a deeper
understanding of the current landscape is needed. This paper presents an
overview of the currently public available COVID-19 chest X-ray datasets. Each
dataset is briefly described and potential strength, limitations and
interactions between datasets are identified. In particular, some key
properties of current datasets that could be potential sources of bias,
impairing models trained on them are pointed out. These descriptions are useful
for model building on those datasets, to choose the best dataset according the
model goal, to take into account the specific limitations to avoid reporting
overconfident benchmark results, and to discuss their impact on the
generalisation capabilities in a specific clinical settingComment: 12 pages, 3 figure
Machine learning models for diagnosis and prognosis of Parkinson's disease using brain imaging: general overview, main challenges, and future directions
Parkinson’s disease (PD) is a progressive and complex neurodegenerative disorder
associated with age that affects motor and cognitive functions. As there is currently
no cure, early diagnosis and accurate prognosis are essential to increase the
effectiveness of treatment and control its symptoms. Medical imaging, specifically
magnetic resonance imaging (MRI), has emerged as a valuable tool for developing
support systems to assist in diagnosis and prognosis. The current literature aims
to improve understanding of the disease’s structural and functional manifestations
in the brain. By applying artificial intelligence to neuroimaging, such as deep
learning (DL) and other machine learning (ML) techniques, previously unknown
relationships and patterns can be revealed in this high-dimensional data. However,
several issues must be addressed before these solutions can be safely integrated
into clinical practice. This review provides a comprehensive overview of recent
ML techniques analyzed for the automatic diagnosis and prognosis of PD in brain
MRI. The main challenges in applying ML to medical diagnosis and its implications
for PD are also addressed, including current limitations for safe translation into
hospitals. These challenges are analyzed at three levels: disease-specific, task-
specific, and technology-specific. Finally, potential future directions for each
challenge and future perspectives are discusse
The effect of dataset confounding on predictions of deep neural networks for medical imaging
The use of Convolutional Neural Networks (CNN) in medical imaging has often outperformed previous solutions and even specialists, becoming a promising technology for Computer-aided-Diagnosis (CAD) systems. However, recent works suggested that CNN may have poor generalisation on new data, for instance, generated in different hospitals. Uncontrolled confounders have been proposed as a common reason. In this paper, we experimentally demonstrate the impact of confounding data in unknown scenarios. We assessed the effect of four confounding configurations: total, strong, light and balanced. We found the confounding effect is especially prominent in total confounder scenarios, while the effect on light and strong confounding scenarios may depend on the dataset robustness. Our findings indicate that the confounding effect is independent of the architecture employed.
These findings might explain why models can report good metrics during the development stage but fail to translate to real-world settings. We highlight the need for thorough consideration of these commonly unattended aspects, to develop safer CNN-based CAD systems
Towards Generalizable Machine Learning for Chest X-ray Diagnosis with Multi-task learning
Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare costs. While research has produced classification models that perform well on a given dataset, the same models lack generalization on different datasets. This reduces confidence that these models can be reliably deployed across various clinical settings.
We propose an approach based on multitask learning to improve model generalization. We demonstrate that learning a (main) pathology together with an auxiliary pathology can significantly impact generalization performance (between -10% and +15% AUC-ROC). A careful choice of auxiliary pathology even yields competitive performance with state-of-the-art models that rely on fine-tuning or ensemble learning, using between 6% and 34% of the training data that these models required. We, further, provide a method to determine what is the best auxiliary task to choose without access to the target dataset.
Ultimately, our work makes a big step towards the creation of CXR diagnosis models applicable in the real world, through the evidence that multitask learning can drastically improve generalization
The Virtual Metabolic Human database: integrating human and gut microbiome metabolism with nutrition and disease
A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic Human (VMH, www.vmh.life) database encapsulating current knowledge of human metabolism within five interlinked resources ‘Human metabolism’, ‘Gut microbiome’, ‘Disease’, ‘Nutrition’, and ‘ReconMaps’. The VMH captures 5180 unique metabolites, 17 730 unique reactions, 3695 human genes, 255 Mendelian diseases, 818 microbes, 632 685 microbial genes and 8790 food items. The VMH’s unique features are (i) the hosting of the metabolic reconstructions of human and gut microbes amenable for metabolic modeling; (ii) seven human metabolic maps for data visualization; (iii) a nutrition designer; (iv) a user-friendly webpage and application-programming interface to access its content; (v) user feedback option for community engagement and (vi) the connection of its entities to 57 other web resources. The VMH represents a novel, interdisciplinary database for data interpretation and hypothesis generation to the biomedical community
Worldwide trends in underweight and obesity from 1990 to 2022: a pooled analysis of 3663 population-representative studies with 222 million children, adolescents, and adults
Background Underweight and obesity are associated with adverse health outcomes throughout the life course. We
estimated the individual and combined prevalence of underweight or thinness and obesity, and their changes, from
1990 to 2022 for adults and school-aged children and adolescents in 200 countries and territories.
Methods We used data from 3663 population-based studies with 222 million participants that measured height and
weight in representative samples of the general population. We used a Bayesian hierarchical model to estimate
trends in the prevalence of different BMI categories, separately for adults (age ≥20 years) and school-aged children
and adolescents (age 5–19 years), from 1990 to 2022 for 200 countries and territories. For adults, we report the
individual and combined prevalence of underweight (BMI <18·5 kg/m2) and obesity (BMI ≥30 kg/m2). For schoolaged children and adolescents, we report thinness (BMI <2 SD below the median of the WHO growth reference)
and obesity (BMI >2 SD above the median).
Findings From 1990 to 2022, the combined prevalence of underweight and obesity in adults decreased in
11 countries (6%) for women and 17 (9%) for men with a posterior probability of at least 0·80 that the observed
changes were true decreases. The combined prevalence increased in 162 countries (81%) for women and
140 countries (70%) for men with a posterior probability of at least 0·80. In 2022, the combined prevalence of
underweight and obesity was highest in island nations in the Caribbean and Polynesia and Micronesia, and
countries in the Middle East and north Africa. Obesity prevalence was higher than underweight with posterior
probability of at least 0·80 in 177 countries (89%) for women and 145 (73%) for men in 2022, whereas the converse
was true in 16 countries (8%) for women, and 39 (20%) for men. From 1990 to 2022, the combined prevalence of
thinness and obesity decreased among girls in five countries (3%) and among boys in 15 countries (8%) with a
posterior probability of at least 0·80, and increased among girls in 140 countries (70%) and boys in 137 countries (69%)
with a posterior probability of at least 0·80. The countries with highest combined prevalence of thinness and
obesity in school-aged children and adolescents in 2022 were in Polynesia and Micronesia and the Caribbean for
both sexes, and Chile and Qatar for boys. Combined prevalence was also high in some countries in south Asia, such
as India and Pakistan, where thinness remained prevalent despite having declined. In 2022, obesity in school-aged
children and adolescents was more prevalent than thinness with a posterior probability of at least 0·80 among girls
in 133 countries (67%) and boys in 125 countries (63%), whereas the converse was true in 35 countries (18%) and
42 countries (21%), respectively. In almost all countries for both adults and school-aged children and adolescents,
the increases in double burden were driven by increases in obesity, and decreases in double burden by declining
underweight or thinness.
Interpretation The combined burden of underweight and obesity has increased in most countries, driven by an
increase in obesity, while underweight and thinness remain prevalent in south Asia and parts of Africa. A healthy
nutrition transition that enhances access to nutritious foods is needed to address the remaining burden of
underweight while curbing and reversing the increase in obesit
Biomedical Image Analysis using Deep Learning: Towards Clinically Applicable Models.
Applied artificial intelligence has a huge potential to transform the way biomedical research and healthcare practice are conducted. Computer vision analysis powered by Deep Learning (DL) has already proven promising for understanding biomedical images. From microscopy to brain imaging, current DL-based solutions are more accurate, faster and easier to develop and deploy than traditional solutions. However, its application to the clinic is not ready yet. Many models present poor generalisation, lack explainability, or suffer from algorithmic biases, which may lead to overconfident results that could lead to life-threatening consequences.
In this context, the present thesis explores the current translation of Machine Learning (ML) solutions to real-world settings in biomedical research and clinical practice with a special emphasis on DL solutions for biomedical imaging. The aim of this thesis is threefold. First, to study the state-of-the-art of computer vision solutions for biomedical tasks together with their performance with noisy labelled data and capabilities to identify biomarkers of Parkinson’s Disease. Second, to investigate the factors that impact the models’ accuracy and their phenomenological fidelity. This includes understanding the effects of confounded datasets in medical imaging tasks and the overall suitability of datasets in ML tasks. Finally, to provide good practices and tools for robust DL solutions in biomedical applications.
In conclusion, this thesis offers insights to develop more robust ML solutions for biomedical and healthcare settings through a multidisciplinary approach that combines cutting-edge technologies with scientific methodologies. We hope the present thesis brings new knowledge to the area and provides new opportunities for future researchers.3. Good health and well-bein
The need of standardised metadata to encode causal relationships: Towards safer data-driven machine learning biological solutions
In this paper, we discuss the importance of considering causal relations in the development of machine learning solutions to prevent factors hampering the robustness and generalisation capacity of the models, such as induced biases. This issue often arises when the algorithm decision is affected by confounding factors. In this work, we argue that the integration of causal relationships can identify potential confounders. We call for standardised meta-information practices as a crucial step for proper machine learning solutions development, validation, and data sharing. Such practices include detailing the dataset generation process, aiming for automatic integration of causal relationships