466 research outputs found

    Detecting transthyretin amyloid cardiomyopathy (ATTR-CM) using machine learning: an evaluation of the performance of an algorithm in a UK setting

    Get PDF
    Objective: The aim of this study was to evaluate the potential real-world application of a machine learning (ML) algorithm, developed and trained on heart failure (HF) cohorts in the USA, to detect patients with undiagnosed wild type cardiac amyloidosis (ATTRwt) in the UK. // Design: In this retrospective observational study, anonymised, linked primary and secondary care data (Clinical Practice Research Datalink GOLD and Hospital Episode Statistics, respectively, were used to identify patients diagnosed with HF between 2009 and 2018 in the UK. International Classification of Diseases (ICD)-10 clinical modification codes were matched to equivalent Read (primary care) and ICD-10 WHO (secondary care) diagnosis codes used in the UK. In the absence of specific Read or ICD-10 WHO codes for ATTRwt, two proxy case definitions (definitive and possible cases) based on the degree of confidence that the contributing codes defined true ATTRwt cases were created using ML. // Primary outcome measure: Algorithm performance was evaluated primarily using the area under the receiver operating curve (AUROC) by comparing the actual versus algorithm predicted case definitions at varying sensitivities and specificities. // Results: The algorithm demonstrated strongest predictive ability when a combination of primary care and secondary care data were used (AUROC: 0.84 in definitive cohort and 0.86 in possible cohort). For primary care or secondary care data alone, performance ranged from 0.68 to 0.78. // Conclusion: The ML algorithm, despite being developed in a US population, was effective at identifying patients that may have ATTRwt in a UK setting. Its potential use in research and clinical care to aid identification of patients with undiagnosed ATTRwt, possibly enabling earlier diagnosis in the disease pathway, should be investigated

    An Evaluation of the Use of a Clinical Research Data Warehouse and I2b2 Infrastructure to Facilitate Replication of Research

    Get PDF
    Replication of clinical research is requisite for forming effective clinical decisions and guidelines. While rerunning a clinical trial may be unethical and prohibitively expensive, the adoption of EHRs and the infrastructure for distributed research networks provide access to clinical data for observational and retrospective studies. Herein I demonstrate a means of using these tools to validate existing results and extend the findings to novel populations. I describe the process of evaluating published risk models as well as local data and infrastructure to assess the replicability of the study. I use an example of a risk model unable to be replicated as well as a study of in-hospital mortality risk I replicated using UNMC’s clinical research data warehouse. In these examples and other studies we have participated in, some elements are commonly missing or under-developed. One such missing element is a consistent and computable phenotype for pregnancy status based on data recorded in the EHR. I survey local clinical data and identify a number of variables correlated with pregnancy as well as demonstrate the data required to identify the temporal bounds of a pregnancy episode. Next, another common obstacle to replicating risk models is the necessity of linking to alternative data sources while maintaining data in a de-identified database. I demonstrate a pipeline for linking clinical data to socioeconomic variables and indices obtained from the American Community Survey (ACS). While these data are location-based, I provide a method for storing them in a HIPAA compliant fashion so as not to identify a patient’s location. While full and efficient replication of all clinical studies is still a future goal, the demonstration of replication as well as beginning the development of a computable phenotype for pregnancy and the incorporation of location based data in a de-identified data warehouse demonstrate how the EHR data and a research infrastructure may be used to facilitate this effort

    Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review

    Get PDF
    OBJECTIVE: Disease comorbidity is a major challenge in healthcare affecting the patient's quality of life and costs. AI-based prediction of comorbidities can overcome this issue by improving precision medicine and providing holistic care. The objective of this systematic literature review was to identify and summarise existing machine learning (ML) methods for comorbidity prediction and evaluate the interpretability and explainability of the models. MATERIALS AND METHODS: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was used to identify articles in three databases: Ovid Medline, Web of Science and PubMed. The literature search covered a broad range of terms for the prediction of disease comorbidity and ML, including traditional predictive modelling. RESULTS: Of 829 unique articles, 58 full-text papers were assessed for eligibility. A final set of 22 articles with 61 ML models was included in this review. Of the identified ML models, 33 models achieved relatively high accuracy (80-95%) and AUC (0.80-0.89). Overall, 72% of studies had high or unclear concerns regarding the risk of bias. DISCUSSION: This systematic review is the first to examine the use of ML and explainable artificial intelligence (XAI) methods for comorbidity prediction. The chosen studies focused on a limited scope of comorbidities ranging from 1 to 34 (mean = 6), and no novel comorbidities were found due to limited phenotypic and genetic data. The lack of standard evaluation for XAI hinders fair comparisons. CONCLUSION: A broad range of ML methods has been used to predict the comorbidities of various disorders. With further development of explainable ML capacity in the field of comorbidity prediction, there is a significant possibility of identifying unmet health needs by highlighting comorbidities in patient groups that were not previously recognised to be at risk for particular comorbidities

    Variational Bayes latent class approach for EHR-based phenotyping with large real-world data

    Full text link
    Bayesian approaches to clinical analyses for the purposes of patient phenotyping have been limited by the computational challenges associated with applying the Markov-Chain Monte-Carlo (MCMC) approach to large real-world data. Approximate Bayesian inference via optimization of the variational evidence lower bound, often called Variational Bayes (VB), has been successfully demonstrated for other applications. We investigate the performance and characteristics of currently available R and Python VB software for variational Bayesian Latent Class Analysis (LCA) of realistically large real-world observational data. We used a real-world data set, Optum\textsuperscript{TM} electronic health records (EHR), containing pediatric patients with risk indicators for type 2 diabetes mellitus that is a rare form in pediatric patients. The aim of this work is to validate a Bayesian patient phenotyping model for generality and extensibility and crucially that it can be applied to a realistically large real-world clinical data set. We find currently available automatic VB methods are very sensitive to initial starting conditions, model definition, algorithm hyperparameters and choice of gradient optimiser. The Bayesian LCA model was challenging to implement using VB but we achieved reasonable results with very good computational performance compared to MCMC.Comment: 11 pages, 11 figures. Supplementary material available on reques

    American Family Cohort, a data resource description

    Full text link
    This manuscript is a research resource description and presents a large and novel Electronic Health Records (EHR) data resource, American Family Cohort (AFC). The AFC data is derived from Centers for Medicare and Medicaid Services (CMS) certified American Board of Family Medicine (ABFM) PRIME registry. The PRIME registry is the largest national Qualified Clinical Data Registry (QCDR) for Primary Care. The data is converted to a popular common data model, the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The resource presents approximately 90 million encounters for 7.5 million patients. All 100% of the patients present age, gender, and address information, and 73% report race. Nealy 93% of patients have lab data in LOINC, 86% have medication data in RxNorm, 93% have diagnosis in SNOWMED and ICD, 81% have procedures in HCPCS or CPT, and 61% have insurance information. The richness, breadth, and diversity of this research accessible and research ready data is expected to accelerate observational studies in many diverse areas. We expect this resource to facilitate research in many years to come
    • …
    corecore