11 research outputs found

    Can Machine Learning Models Predict Asparaginase-associated Pancreatitis in Childhood Acute Lymphoblastic Leukemia

    Get PDF
    Publisher Copyright: © 2021 Lippincott Williams and Wilkins. All rights reserved.Asparaginase-associated pancreatitis (AAP) frequently affects children treated for acute lymphoblastic leukemia (ALL) causing severe acute and persisting complications. Known risk factors such as asparaginase dosing, older age and single nucleotide polymorphisms (SNPs) have insufficient odds ratios to allow personalized asparaginase therapy. In this study, we explored machine learning strategies for prediction of individual AAP risk. We integrated information on age, sex, and SNPs based on Illumina Omni2.5exome-8 arrays of patients with childhood ALL (N=1564, 244 with AAP aged 1.0 to 17.9 y) from 10 international ALL consortia into machine learning models including regression, random forest, AdaBoost and artificial neural networks. A model with only age and sex had area under the receiver operating characteristic curve (ROC-AUC) of 0.62. Inclusion of 6 pancreatitis candidate gene SNPs or 4 validated pancreatitis SNPs boosted ROC-AUC somewhat (0.67) while 30 SNPs, identified through our AAP genome-wide association study cohort, boosted performance (0.80). Most predictive features included rs10273639 (PRSS1-PRSS2), rs10436957 (CTRC), rs13228878 (PRSS1/PRSS2), rs1505495 (GALNTL6), rs4655107 (EPHB2) and age (1 to 7 y). Second AAP following asparaginase re-exposure was predicted with ROC-AUC: 0.65. The machine learning models assist individual-level risk assessment of AAP for future prevention trials, and may legitimize asparaginase re-exposure when AAP risk is predicted to be low.Peer reviewe

    ResFinder - an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

    No full text
    Antimicrobial resistance (AMR) is one of the most important health threats globally. The ability to accurately identify resistant bacterial isolates and the individual antimicrobial resistance genes (ARGs) is essential for understanding the evolution and emergence of AMR and to provide appropriate treatment. The rapid developments in next-generation sequencing technologies have made this technology available to researchers and microbiologists at routine laboratories around the world. However, tools available for those with limited experience with bioinformatics are lacking, especially to enable researchers and microbiologists in low- and middle-income countries (LMICs) to perform their own studies. The CGE-tools (Center for Genomic Epidemiology) including ResFinder (https://cge.cbs.dtu.dk/services/ResFinder/) was developed to provide freely available easy to use online bioinformatic tools allowing inexperienced researchers and microbiologists to perform simple bioinformatic analyses. The main purpose was and is to provide these solutions for people involved in frontline diagnosis especially in LMICs. Since its original publication in 2012, ResFinder has undergone a number of improvements including improvement of the code and databases, inclusion of point mutations for selected bacterial species and predictions of phenotypes also for selected species. As of 28 September 2021, 820 803 analyses have been performed using ResFinder from 61 776 IP-addresses in 171 countries. ResFinder clearly fulfills a need for several people around the globe and we hope to be able to continue to provide this service free of charge in the future. We also hope and expect to provide further improvements including phenotypic predictions for additional bacterial species

    Predicting Antimicrobial Resistance Using Partial Genome Alignments

    No full text
    Antimicrobial resistance (AMR) is an important global health threat that impacts millions of people worldwide each year. Developing methods that can detect and predict AMR phenotypes can help to mitigate the spread of AMR by informing clinical decision making and appropriate mitigation strategies. Many bioinformatic methods have been developed for predicting AMR phenotypes from whole-genome sequences and AMR genes, but recent studies have indicated that predictions can be made from incomplete genome sequence data. In order to more systematically understand this, we built random forest-based machine learning classifiers for predicting susceptible and resistant phenotypes for Klebsiella pneumoniae (1,640 strains), Mycobacterium tuberculosis (2,497 strains), and Salmonella enterica (1,981 strains). We started by building models from alignments that were based on a reference chromosome for each species. We then subsampled each chromosomal alignment and built models for the resulting subalignments, finding that very small regions, representing approximately 0.1 to 0.2% of the chromosome, are predictive. In K. pneumoniae, M. tuberculosis, and S. enterica, the subalignments are able to predict multiple AMR phenotypes with at least 70% accuracy, even though most do not encode an AMR-related function. We used these models to identify regions of the chromosome with high and low predictive signals. Finally, subalignments that retain high accuracy across larger phylogenetic distances were examined in greater detail, revealing genes and intergenic regions with potential links to AMR, virulence, transport, and survival under stress conditions. IMPORTANCE Antimicrobial resistance causes thousands of deaths annually worldwide. Understanding the regions of the genome that are involved in antimicrobial resistance is important for developing mitigation strategies and preventing transmission. Machine learning models are capable of predicting antimicrobial resistance phenotypes from bacterial genome sequence data by identifying resistance genes, mutations, and other correlated features. They are also capable of implicating regions of the genome that have not been previously characterized as being involved in resistance. In this study, we generated global chromosomal alignments for Klebsiella pneumoniae, Mycobacterium tuberculosis, and Salmonella enterica and systematically searched them for small conserved regions of the genome that enable the prediction of antimicrobial resistance phenotypes. In addition to known antimicrobial resistance genes, this analysis identified genes involved in virulence and transport functions, as well as many genes with no previous implication in antimicrobial resistance

    PlasmidHostFinder:Prediction of Plasmid Hosts Using Random Forest

    No full text
    Plasmids play a major role facilitating the spread of antimicrobial resistance between bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identification of plasmid host ranges could be improved using automated pattern detection methods compared to homology-based methods due to the diversity and genetic plasticity of plasmids. In this study, we developed a method for predicting the host range of plasmids using machine learning-specifically, random forests. We trained the models with 8,519 plasmids from 359 different bacterial species per taxonomic level; the models achieved Matthews correlation coefficients of 0.662 and 0.867 at the species and order levels, respectively. Our results suggest that despite the diverse nature and genetic plasticity of plasmids, our random forest model can accurately distinguish between plasmid hosts. This tool is available online through the Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/PlasmidHostFinder/). IMPORTANCE Antimicrobial resistance is a global health threat to humans and animals, causing high mortality and morbidity while effectively ending decades of success in fighting against bacterial infections. Plasmids confer extra genetic capabilities to the host organisms through accessory genes that can encode antimicrobial resistance and virulence. In addition to lateral inheritance, plasmids can be transferred horizontally between bacterial taxa. Therefore, detection of the host range of plasmids is crucial for understanding and predicting the dissemination trajectories of extrachromosomal genes and bacterial evolution as well as taking effective countermeasures against antimicrobial resistance

    Understanding and predicting ciprofloxacin minimum inhibitory concentration in Escherichia coli with machine learning

    No full text
    It is important that antibiotics prescriptions are based on antimicrobial susceptibility data to ensure effective treatment outcomes. The increasing availability of next-generation sequencing, bacterial whole genome sequencing (WGS) can facilitate a more reliable and faster alternative to traditional phenotyping for the detection and surveillance of AMR. This work proposes a machine learning approach that can predict the minimum inhibitory concentration (MIC) for a given antibiotic, here ciprofloxacin, on the basis of both genome-wide mutation profiles and profiles of acquired antimicrobial resistance genes. We analysed 704 Escherichia coli genomes combined with their respective MIC measurements for ciprofloxacin originating from different countries. The four most important predictors found by the model, mutations in gyrA residues Ser83 and Asp87, a mutation in parC residue Ser80 and presence of the qnrS1 gene, have been experimentally validated before. Using only these four predictors in a linear regression model, 65% and 93% of the test samples’ MIC were correctly predicted within a two- and a four-fold dilution range, respectively. The presented work does not treat machine learning as a black box model concept, but also identifies the genomic features that determine susceptibility. The recent progress in WGS technology in combination with machine learning analysis approaches indicates that in the near future WGS of bacteria might become cheaper and faster than a MIC measurement

    A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes

    No full text
    Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt

    A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing [version 1; peer review: 2 approved with reservations]

    No full text
    Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain “live” (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines’ implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community
    corecore