100 research outputs found
Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.
OBJECTIVE: Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning.
MATERIALS AND METHODS: We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network.
RESULTS: For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction.
DISCUSSION/CONCLUSION: In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted
Prediction of Brain Metastases Development in Patients With Lung Cancer by Explainable Artificial Intelligence From Electronic Health Records
PURPOSE: Early detection of brain metastases (BMs) is critical for prompt treatment and optimal control of the disease. In this study, we seek to predict the risk of developing BM among patients diagnosed with lung cancer on the basis of electronic health record (EHR) data and to understand what factors are important for the model to predict BM development through explainable artificial intelligence approaches accurately.
MATERIALS AND METHODS: We trained a recurrent neural network model, REverse Time AttentIoN (RETAIN), to predict the risk of developing BM using structured EHR data. To interpret the model\u27s decision process, we analyzed the attention weights in the RETAIN model and the SHAP values from a feature attribution method, Kernel SHAP, to identify the factors contributing to BM prediction.
RESULTS: We developed a high-quality cohort with 4,466 patients with BM from the Cerner Health Fact database, which contains over 70 million patients from more than 600 hospitals. RETAIN uses this data set to achieve the best area under the receiver operating characteristic curve at 0.825, a significant improvement over the baseline model. We also extended a feature attribution method, Kernel SHAP, to structured EHR data for model interpretation. Both RETAIN and Kernel SHAP can identify important features related to BM prediction.
CONCLUSION: To the best of our knowledge, this is the first study to predict BM using structured EHR data. We achieved decent prediction performance for BM prediction and identified factors highly relevant to BM development. The sensitivity analysis demonstrated that both RETAIN and Kernel SHAP could discriminate unrelated features and put more weight on the features important to BM. Our study explored the potential of applying explainable artificial intelligence for future clinical applications
Prediction of Brain Metastases Development in Patients With Lung Cancer by Explainable Artificial Intelligence From Electronic Health Records
PURPOSE: Early detection of brain metastases (BMs) is critical for prompt treatment and optimal control of the disease. In this study, we seek to predict the risk of developing BM among patients diagnosed with lung cancer on the basis of electronic health record (EHR) data and to understand what factors are important for the model to predict BM development through explainable artificial intelligence approaches accurately.
MATERIALS AND METHODS: We trained a recurrent neural network model, REverse Time AttentIoN (RETAIN), to predict the risk of developing BM using structured EHR data. To interpret the model\u27s decision process, we analyzed the attention weights in the RETAIN model and the SHAP values from a feature attribution method, Kernel SHAP, to identify the factors contributing to BM prediction.
RESULTS: We developed a high-quality cohort with 4,466 patients with BM from the Cerner Health Fact database, which contains over 70 million patients from more than 600 hospitals. RETAIN uses this data set to achieve the best area under the receiver operating characteristic curve at 0.825, a significant improvement over the baseline model. We also extended a feature attribution method, Kernel SHAP, to structured EHR data for model interpretation. Both RETAIN and Kernel SHAP can identify important features related to BM prediction.
CONCLUSION: To the best of our knowledge, this is the first study to predict BM using structured EHR data. We achieved decent prediction performance for BM prediction and identified factors highly relevant to BM development. The sensitivity analysis demonstrated that both RETAIN and Kernel SHAP could discriminate unrelated features and put more weight on the features important to BM. Our study explored the potential of applying explainable artificial intelligence for future clinical applications
Genetic characterization and linkage disequilibrium mapping of resistance to gray leaf spot in maize (Zea mays L.)
AbstractGray leaf spot (GLS), caused by Cercospora zeae-maydis, is an important foliar disease of maize (Zea mays L.) worldwide, resistance to which is controlled by multiple quantitative trait loci (QTL). To gain insights into the genetic architecture underlying the resistance to this disease, an association mapping population consisting of 161 inbred lines was evaluated for resistance to GLS in a plant pathology nursery at Shenyang in 2010 and 2011. Subsequently, a genome-wide association study, using 41,101 single-nucleotide polymorphisms (SNPs), identified 51 SNPs significantly (P<0.001) associated with GLS resistance, which could be converted into 31 QTL. In addition, three candidate genes related to plant defense were identified, including nucleotide-binding-site/leucine-rich repeat, receptor-like kinase genes similar to those involved in basal defense. Two genic SNPs, PZE-103142893 and PZE-109119001, associated with GLS resistance in chromosome bins 3.07 and 9.07, can be used for marker-assisted selection (MAS) of GLS resistance. These results provide an important resource for developing molecular markers closely linked with the target trait, enhancing breeding efficiency
Dynamic Prognosis Prediction for Patients on DAPT After Drug-Eluting Stent Implantation: Model Development and Validation
BACKGROUND: The rapid evolution of artificial intelligence (AI) in conjunction with recent updates in dual antiplatelet therapy (DAPT) management guidelines emphasizes the necessity for innovative models to predict ischemic or bleeding events after drug-eluting stent implantation. Leveraging AI for dynamic prediction has the potential to revolutionize risk stratification and provide personalized decision support for DAPT management.
METHODS AND RESULTS: We developed and validated a new AI-based pipeline using retrospective data of drug-eluting stent-treated patients, sourced from the Cerner Health Facts data set (n=98 236) and Optum\u27s de-identified Clinformatics Data Mart Database (n=9978). The 36 months following drug-eluting stent implantation were designated as our primary forecasting interval, further segmented into 6 sequential prediction windows. We evaluated 5 distinct AI algorithms for their precision in predicting ischemic and bleeding risks. Model discriminative accuracy was assessed using the area under the receiver operating characteristic curve, among other metrics. The weighted light gradient boosting machine stood out as the preeminent model, thus earning its place as our AI-DAPT model. The AI-DAPT demonstrated peak accuracy in the 30 to 36 months window, charting an area under the receiver operating characteristic curve of 90% [95% CI, 88%-92%] for ischemia and 84% [95% CI, 82%-87%] for bleeding predictions.
CONCLUSIONS: Our AI-DAPT excels in formulating iterative, refined dynamic predictions by assimilating ongoing updates from patients\u27 clinical profiles, holding value as a novel smart clinical tool to facilitate optimal DAPT duration management with high accuracy and adaptability
Dissecting the Genetic Basis Underlying Combining Ability of Plant Height Related Traits in Maize
Maize plant height related traits including plant height, ear height, and internode number are tightly linked with biomass, planting density, and grain yield in the field. Previous studies have focused on understanding the genetic basis of plant architecture traits per se, but the genetic basis of combining ability remains poorly understood. In this study, 328 recombinant inbred lines were inter-group crossed with two testers to produce 656 hybrids using the North Carolina II mating design. Both of the parental lines and hybrids were evaluated in two summer maize-growing regions of China in 2015 and 2016. QTL mapping highlighted that 7 out of 16 QTL detected for RILs per se could be simultaneously detected for general combining ability (GCA) effects, suggesting that GCA effects and the traits were genetically controlled by different sets of loci. Among the 35 QTL identified for hybrid performance, 57.1% and 28.5% QTL overlapped with additive/GCA and non-additive/SCA effects, suggesting that the small percentage of hybrid variance due to SCA effects in our design. Two QTL hotspots, located on chromosomes 5 and 10 and including the qPH5-1 and qPH10 loci, were validated for plant height related traits by Ye478 derivatives. Notably, the qPH5-1 locus could simultaneously affect the RILs per se and GCA effects while the qPH10, a major QTL (PVE > 10%) with pleiotropic effects, only affected the GCA effects. These results provide evidence that more attention should be focused on loci that influence combining ability directly in maize hybrid breeding
Epigenome-wide association study (EWAS) of BMI, BMI change and waist circumference in African American adults identifies multiple replicated loci
Obesity is an important component of the pathophysiology of chronic diseases. Identifying epigenetic modifications associated with elevated adiposity, including DNA methylation variation, may point to genomic pathways that are dysregulated in numerous conditions. The Illumina 450K Bead Chip array was used to assay DNA methylation in leukocyte DNA obtained from 2097 African American adults in the Atherosclerosis Risk in Communities (ARIC) study. Mixed-effects regression models were used to test the association of methylation beta value with concurrent body mass index (BMI) and waist circumference (WC), and BMI change, adjusting for batch effects and potential confounders. Replication using whole-blood DNA from 2377 White adults in the Framingham Heart Study and CD4+ T cell DNA from 991 Whites in the Genetics of Lipid Lowering Drugs and Diet Network Study was followed by testing using adipose tissue DNA from 648 women in the Multiple Tissue Human Expression Resource cohort. Seventy-six BMI-related probes, 164 WC-related probes and 8 BMI change-related probes passed the threshold for significance in ARIC (P < 1 × 10−7; Bonferroni), including probes in the recently reported HIF3A, CPT1A and ABCG1 regions. Replication using blood DNA was achieved for 37 BMI probes and 1 additional WC probe. Sixteen of these also replicated in adipose tissue, including 15 novel methylation findings near genes involved in lipid metabolism, immune response/cytokine signaling and other diverse pathways, including LGALS3BP, KDM2B, PBX1 and BBS2, among others. Adiposity traits are associated with DNA methylation at numerous CpG sites that replicate across studies despite variation in tissue type, ethnicity and analytic approaches
- …