917 research outputs found

    Using Explainable Artificial Intelligence to Discover Interactions in an Ecological Model for Obesity

    Get PDF
    Ecological theories suggest that environmental, social, and individual factors interact to cause obesity. Yet, many analytic techniques, such as multilevel modeling, require manual specification of interacting factors, making them inept in their ability to search for interactions. This paper shows evidence that an explainable artificial intelligence approach, commonly employed in genomics research, can address this problem. The method entails using random intersection trees to decode interactions learned by random forest models. Here, this approach is used to extract interactions between features of a multi-level environment from random forest models of waist-to-height ratios using 11,112 participants from the Adolescent Brain Cognitive Development study. This study shows that methods used to discover interactions between genes can also discover interacting features of the environment that impact obesity. This new approach to modeling ecosystems may help shine a spotlight on combinations of environmental features that are important to obesity, as well as other health outcomes

    Explainable Machine-Learning Models for COVID-19 Prognosis Prediction Using Clinical, Laboratory and Radiomic Features

    Get PDF
    The SARS-CoV-2 virus pandemic had devastating effects on various aspects of life: clinical cases, ranging from mild to severe, can lead to lung failure and to death. Due to the high incidence, data-driven models can support physicians in patient management. The explainability and interpretability of machine-learning models are mandatory in clinical scenarios. In this work, clinical, laboratory and radiomic features were used to train machine-learning models for COVID-19 prognosis prediction. Using Explainable AI algorithms, a multi-level explainable method was proposed taking into account the developer and the involved stakeholder (physician, and patient) perspectives. A total of 1023 radiomic features were extracted from 1589 Chest X-Ray images (CXR), combined with 38 clinical/laboratory features. After the pre-processing and selection phases, 40 CXR radiomic features and 23 clinical/laboratory features were used to train Support Vector Machine and Random Forest classifiers exploring three feature selection strategies. The combination of both radiomic, and clinical/laboratory features enabled higher performance in the resulting models. The intelligibility of the used features allowed us to validate the models' clinical findings. According to the medical literature, LDH, PaO2 and CRP were the most predictive laboratory features. Instead, ZoneEntropy and HighGrayLevelZoneEmphasis - indicative of the heterogeneity/uniformity of lung texture - were the most discriminating radiomic features. Our best predictive model, exploiting the Random Forest classifier and a signature composed of clinical, laboratory and radiomic features, achieved AUC=0.819, accuracy=0.733, specificity=0.705, and sensitivity=0.761 in the test set. The model, including a multi-level explainability, allows us to make strong clinical assumptions, confirmed by the literature insights

    Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets

    Get PDF
    Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives

    Challenges and opportunities beyond structured data in analysis of electronic health records

    Get PDF
    Electronic health records (EHR) contain a lot of valuable information about individual patients and the whole population. Besides structured data, unstructured data in EHRs can provide extra, valuable information but the analytics processes are complex, time-consuming, and often require excessive manual effort. Among unstructured data, clinical text and images are the two most popular and important sources of information. Advanced statistical algorithms in natural language processing, machine learning, deep learning, and radiomics have increasingly been used for analyzing clinical text and images. Although there exist many challenges that have not been fully addressed, which can hinder the use of unstructured data, there are clear opportunities for well-designed diagnosis and decision support tools that efficiently incorporate both structured and unstructured data for extracting useful information and provide better outcomes. However, access to clinical data is still very restricted due to data sensitivity and ethical issues. Data quality is also an important challenge in which methods for improving data completeness, conformity and plausibility are needed. Further, generalizing and explaining the result of machine learning models are important problems for healthcare, and these are open challenges. A possible solution to improve data quality and accessibility of unstructured data is developing machine learning methods that can generate clinically relevant synthetic data, and accelerating further research on privacy preserving techniques such as deidentification and pseudonymization of clinical text
    • …
    corecore