2 research outputs found
A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes
Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67Â 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt
Can machine learning models predict asparaginase-associated pancreatitis in childhood acute lymphoblastic leukemia
Abstract
Asparaginase-associated pancreatitis (AAP) frequently affects children treated for acute lymphoblastic leukemia (ALL) causing severe acute and persisting complications. Known risk factors such as asparaginase dosing, older age and single nucleotide polymorphisms (SNPs) have insufficient odds ratios to allow personalized asparaginase therapy. In this study, we explored machine learning strategies for prediction of individual AAP risk. We integrated information on age, sex, and SNPs based on Illumina Omni2.5exome-8 arrays of patients with childhood ALL (N=1564, 244 with AAP 1.0 to 17.9 yo) from 10 international ALL consortia into machine learning models including regression, random forest, AdaBoost and artificial neural networks. A model with only age and sex had area under the receiver operating characteristic curve (ROC-AUC) of 0.62. Inclusion of 6 pancreatitis candidate gene SNPs or 4 validated pancreatitis SNPs boosted ROC-AUC somewhat (0.67) while 30 SNPs, identified through our AAP genome-wide association study cohort, boosted performance (0.80). Most predictive features included rs10273639 (PRSS1-PRSS2), rs10436957 (CTRC), rs13228878 (PRSS1/PRSS2), rs1505495 (GALNTL6), rs4655107 (EPHB2) and age (1 to 7 y). Second AAP following asparaginase re-exposure was predicted with ROC-AUC: 0.65. The machine learning models assist individual-level risk assessment of AAP for future prevention trials, and may legitimize asparaginase re-exposure when AAP risk is predicted to be low