27 research outputs found

    On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets

    Get PDF
    Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical settingComment: 12 pages, 3 figure

    Machine learning models for diagnosis and prognosis of Parkinson's disease using brain imaging: general overview, main challenges, and future directions

    Get PDF
    Parkinson’s disease (PD) is a progressive and complex neurodegenerative disorder associated with age that affects motor and cognitive functions. As there is currently no cure, early diagnosis and accurate prognosis are essential to increase the effectiveness of treatment and control its symptoms. Medical imaging, specifically magnetic resonance imaging (MRI), has emerged as a valuable tool for developing support systems to assist in diagnosis and prognosis. The current literature aims to improve understanding of the disease’s structural and functional manifestations in the brain. By applying artificial intelligence to neuroimaging, such as deep learning (DL) and other machine learning (ML) techniques, previously unknown relationships and patterns can be revealed in this high-dimensional data. However, several issues must be addressed before these solutions can be safely integrated into clinical practice. This review provides a comprehensive overview of recent ML techniques analyzed for the automatic diagnosis and prognosis of PD in brain MRI. The main challenges in applying ML to medical diagnosis and its implications for PD are also addressed, including current limitations for safe translation into hospitals. These challenges are analyzed at three levels: disease-specific, task- specific, and technology-specific. Finally, potential future directions for each challenge and future perspectives are discusse

    The effect of dataset confounding on predictions of deep neural networks for medical imaging

    Get PDF
    The use of Convolutional Neural Networks (CNN) in medical imaging has often outperformed previous solutions and even specialists, becoming a promising technology for Computer-aided-Diagnosis (CAD) systems. However, recent works suggested that CNN may have poor generalisation on new data, for instance, generated in different hospitals. Uncontrolled confounders have been proposed as a common reason. In this paper, we experimentally demonstrate the impact of confounding data in unknown scenarios. We assessed the effect of four confounding configurations: total, strong, light and balanced. We found the confounding effect is especially prominent in total confounder scenarios, while the effect on light and strong confounding scenarios may depend on the dataset robustness. Our findings indicate that the confounding effect is independent of the architecture employed. These findings might explain why models can report good metrics during the development stage but fail to translate to real-world settings. We highlight the need for thorough consideration of these commonly unattended aspects, to develop safer CNN-based CAD systems

    Towards Generalizable Machine Learning for Chest X-ray Diagnosis with Multi-task learning

    Get PDF
    Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare costs. While research has produced classification models that perform well on a given dataset, the same models lack generalization on different datasets. This reduces confidence that these models can be reliably deployed across various clinical settings. We propose an approach based on multitask learning to improve model generalization. We demonstrate that learning a (main) pathology together with an auxiliary pathology can significantly impact generalization performance (between -10% and +15% AUC-ROC). A careful choice of auxiliary pathology even yields competitive performance with state-of-the-art models that rely on fine-tuning or ensemble learning, using between 6% and 34% of the training data that these models required. We, further, provide a method to determine what is the best auxiliary task to choose without access to the target dataset. Ultimately, our work makes a big step towards the creation of CXR diagnosis models applicable in the real world, through the evidence that multitask learning can drastically improve generalization

    The Virtual Metabolic Human database: integrating human and gut microbiome metabolism with nutrition and disease

    Get PDF
    A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic Human (VMH, www.vmh.life) database encapsulating current knowledge of human metabolism within five interlinked resources ‘Human metabolism’, ‘Gut microbiome’, ‘Disease’, ‘Nutrition’, and ‘ReconMaps’. The VMH captures 5180 unique metabolites, 17 730 unique reactions, 3695 human genes, 255 Mendelian diseases, 818 microbes, 632 685 microbial genes and 8790 food items. The VMH’s unique features are (i) the hosting of the metabolic reconstructions of human and gut microbes amenable for metabolic modeling; (ii) seven human metabolic maps for data visualization; (iii) a nutrition designer; (iv) a user-friendly webpage and application-programming interface to access its content; (v) user feedback option for community engagement and (vi) the connection of its entities to 57 other web resources. The VMH represents a novel, interdisciplinary database for data interpretation and hypothesis generation to the biomedical community

    Worldwide trends in underweight and obesity from 1990 to 2022: a pooled analysis of 3663 population-representative studies with 222 million children, adolescents, and adults

    Get PDF
    Background Underweight and obesity are associated with adverse health outcomes throughout the life course. We estimated the individual and combined prevalence of underweight or thinness and obesity, and their changes, from 1990 to 2022 for adults and school-aged children and adolescents in 200 countries and territories. Methods We used data from 3663 population-based studies with 222 million participants that measured height and weight in representative samples of the general population. We used a Bayesian hierarchical model to estimate trends in the prevalence of different BMI categories, separately for adults (age ≥20 years) and school-aged children and adolescents (age 5–19 years), from 1990 to 2022 for 200 countries and territories. For adults, we report the individual and combined prevalence of underweight (BMI <18·5 kg/m2) and obesity (BMI ≥30 kg/m2). For schoolaged children and adolescents, we report thinness (BMI <2 SD below the median of the WHO growth reference) and obesity (BMI >2 SD above the median). Findings From 1990 to 2022, the combined prevalence of underweight and obesity in adults decreased in 11 countries (6%) for women and 17 (9%) for men with a posterior probability of at least 0·80 that the observed changes were true decreases. The combined prevalence increased in 162 countries (81%) for women and 140 countries (70%) for men with a posterior probability of at least 0·80. In 2022, the combined prevalence of underweight and obesity was highest in island nations in the Caribbean and Polynesia and Micronesia, and countries in the Middle East and north Africa. Obesity prevalence was higher than underweight with posterior probability of at least 0·80 in 177 countries (89%) for women and 145 (73%) for men in 2022, whereas the converse was true in 16 countries (8%) for women, and 39 (20%) for men. From 1990 to 2022, the combined prevalence of thinness and obesity decreased among girls in five countries (3%) and among boys in 15 countries (8%) with a posterior probability of at least 0·80, and increased among girls in 140 countries (70%) and boys in 137 countries (69%) with a posterior probability of at least 0·80. The countries with highest combined prevalence of thinness and obesity in school-aged children and adolescents in 2022 were in Polynesia and Micronesia and the Caribbean for both sexes, and Chile and Qatar for boys. Combined prevalence was also high in some countries in south Asia, such as India and Pakistan, where thinness remained prevalent despite having declined. In 2022, obesity in school-aged children and adolescents was more prevalent than thinness with a posterior probability of at least 0·80 among girls in 133 countries (67%) and boys in 125 countries (63%), whereas the converse was true in 35 countries (18%) and 42 countries (21%), respectively. In almost all countries for both adults and school-aged children and adolescents, the increases in double burden were driven by increases in obesity, and decreases in double burden by declining underweight or thinness. Interpretation The combined burden of underweight and obesity has increased in most countries, driven by an increase in obesity, while underweight and thinness remain prevalent in south Asia and parts of Africa. A healthy nutrition transition that enhances access to nutritious foods is needed to address the remaining burden of underweight while curbing and reversing the increase in obesit

    Biomedical Image Analysis using Deep Learning: Towards Clinically Applicable Models.

    No full text
    Applied artificial intelligence has a huge potential to transform the way biomedical research and healthcare practice are conducted. Computer vision analysis powered by Deep Learning (DL) has already proven promising for understanding biomedical images. From microscopy to brain imaging, current DL-based solutions are more accurate, faster and easier to develop and deploy than traditional solutions. However, its application to the clinic is not ready yet. Many models present poor generalisation, lack explainability, or suffer from algorithmic biases, which may lead to overconfident results that could lead to life-threatening consequences. In this context, the present thesis explores the current translation of Machine Learning (ML) solutions to real-world settings in biomedical research and clinical practice with a special emphasis on DL solutions for biomedical imaging. The aim of this thesis is threefold. First, to study the state-of-the-art of computer vision solutions for biomedical tasks together with their performance with noisy labelled data and capabilities to identify biomarkers of Parkinson’s Disease. Second, to investigate the factors that impact the models’ accuracy and their phenomenological fidelity. This includes understanding the effects of confounded datasets in medical imaging tasks and the overall suitability of datasets in ML tasks. Finally, to provide good practices and tools for robust DL solutions in biomedical applications. In conclusion, this thesis offers insights to develop more robust ML solutions for biomedical and healthcare settings through a multidisciplinary approach that combines cutting-edge technologies with scientific methodologies. We hope the present thesis brings new knowledge to the area and provides new opportunities for future researchers.3. Good health and well-bein

    The need of standardised metadata to encode causal relationships: Towards safer data-driven machine learning biological solutions

    No full text
    In this paper, we discuss the importance of considering causal relations in the development of machine learning solutions to prevent factors hampering the robustness and generalisation capacity of the models, such as induced biases. This issue often arises when the algorithm decision is affected by confounding factors. In this work, we argue that the integration of causal relationships can identify potential confounders. We call for standardised meta-information practices as a crucial step for proper machine learning solutions development, validation, and data sharing. Such practices include detailing the dataset generation process, aiming for automatic integration of causal relationships
    corecore