Prediction of Antimicrobial Resistance and Antimicrobial Exposure via Machine Learning

Abstract

The increasing prevalence of antimicrobial-resistant bacteria drives the need for advanced methods to identify antimicrobial-resistance (AMR) genes in bacterial pathogens. With the availability of whole genome sequences, best-hit methods can be used to identify AMR genes by differentiating unknown sequences with known AMR sequences in existing online repositories. Nevertheless, these methods may not perform well when identifying resistance genes with sequences having low sequence identity with known sequences. In this dissertation we present two machine learning approaches that use protein sequences, with sequence identity ranging between 10% and 90%, as an alternative to conventional DNA sequence alignment-based approaches for identifying putative AMR genes in Gram-negative bacteria. We applied both graph and game theories to choose which protein characteristics to use in our machine learning model and were able to predict AMR protein sequences for Gram-negative bacteria with an accuracy ranging from 88% to 100%. In order to obtain similar classification results, identity thresholds as low as 53% were required when using BLASTp. We also extended our game theory-based study for AMR prediction to focus on Gram-positive bacteria and achieved accuracies between 87% and 90%. As the analysis and tracking of antimicrobial utilization is crucial in antimicrobial stewardship efforts in order to find effective interventions for controlling antimicrobial resistance, standard risk adjustment models are needed for benchmarking appropriate antimicrobial utilization and for fair inter-facility comparison. We identified patient- and facility-level predictors in hospitalized patients using a machine learning approach, which can be used to inform a risk adjustment model to facilitate assessment of antimicrobial utilization. Candidate features (predictors) were generated from patient admission records. The number of features was then reduced using a statistical approach, and missing values of the reduced feature set were imputed using bootstrapping and an expectation-maximization algorithm. Finally, support vector regression (SVR) and cubist regression (CB) models were applied to find root mean square error (RMSE) values which were used to evaluate the selected feature set. SVR and CB RMSE values were found to be superior to those found using both linear and negative binomial null models, thereby demonstrating the effectiveness of our selected feature set

    Similar works