24 research outputs found

    DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis.

    Get PDF
    MOTIVATION: Resistance co-occurrence within first-line anti-tuberculosis (TB) drugs is a common phenomenon. Existing methods based on genetic data analysis of Mycobacterium tuberculosis (MTB) have been able to predict resistance of MTB to individual drugs, but have not considered the resistance co-occurrence and cannot capture latent structure of genomic data that corresponds to lineages. RESULTS: We used a large cohort of TB patients from 16 countries across six continents where whole-genome sequences for each isolate and associated phenotype to anti-TB drugs were obtained using drug susceptibility testing recommended by the World Health Organization. We then proposed an end-to-end multi-task model with deep denoising auto-encoder (DeepAMR) for multiple drug classification and developed DeepAMR_cluster, a clustering variant based on DeepAMR, for learning clusters in latent space of the data. The results showed that DeepAMR outperformed baseline model and four machine learning models with mean AUROC from 94.4% to 98.7% for predicting resistance to four first-line drugs [i.e. isoniazid (INH), ethambutol (EMB), rifampicin (RIF), pyrazinamide (PZA)], multi-drug resistant TB (MDR-TB) and pan-susceptible TB (PANS-TB: MTB that is susceptible to all four first-line anti-TB drugs). In the case of INH, EMB, PZA and MDR-TB, DeepAMR achieved its best mean sensitivity of 94.3%, 91.5%, 87.3% and 96.3%, respectively. While in the case of RIF and PANS-TB, it generated 94.2% and 92.2% sensitivity, which were lower than baseline model by 0.7% and 1.9%, respectively. t-SNE visualization shows that DeepAMR_cluster captures lineage-related clusters in the latent space. AVAILABILITY AND IMPLEMENTATION: The details of source code are provided at http://www.robots.ox.ac.uk/?davidc/code.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

    Get PDF
    Several bioinformatics genotyping algorithms are now commonly used to characterize antimicrobial resistance (AMR) gene profiles in whole-genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli, is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail. Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance and SRST2, run with the ResFinder database) and their outputs compared. For simulation tests where 3079 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for 3076 (99.9 %) simulations, ABRicate for 3054 (99.2 %), ARIBA for 2783 (90.4 %) and SRST2 for 2108 (68.5 %). For simulation tests where two closely related gene variants were inserted into random sequence constructs, KmerResistance identified the correct alleles in 35 338/46 318 (76.3 %) simulations, ABRicate identified them in 11 842/46 318 (25.6 %) simulations, ARIBA identified them in 1679/46 318 (3.6 %) simulations and SRST2 identified them in 2000/46 318 (4.3 %) simulations. In real data, across all methods, 1392/1818 (76 %) isolates had discrepant allele calls for at least 1 gene. In addition to highlighting areas for improvement in challenging scenarios, (e.g. identification of AMR genes at <10× coverage, identifying multiple closely related AMR genes present in the same sample), our evaluations identified some more systematic errors that could be readily soluble, such as repeated misclassification (i.e. naming) of genes as shorter variants of the same gene present within the reference resistance gene database. Such naming errors accounted for at least 2530/4321 (59 %) of the discrepancies seen in real data. Moreover, many of the remaining discrepancies were likely ‘artefactual’, with reporting of cut-off differences accounting for at least 1430/4321 (33 %) discrepants. Whilst we found that comparing outputs generated by running multiple algorithms on the same dataset could identify and resolve these algorithmic artefacts, the results of our evaluations emphasize the need for developing new and more robust genotyping algorithms to further improve accuracy and performance

    A crowd of BashTheBug volunteers reproducibly and accurately measure the minimum inhibitory concentrations of 13 antitubercular drugs from photographs of 96-well broth microdilution plates

    Get PDF
    Tuberculosis is a respiratory disease that is treatable with antibiotics. An increasing prevalence of resistance means that to ensure a good treatment outcome it is desirable to test the susceptibility of each infection to different antibiotics. Conventionally, this is done by culturing a clinical sample and then exposing aliquots to a panel of antibiotics, each being present at a pre-determined concentration, thereby determining if the sample isresistant or susceptible to each sample. The minimum inhibitory concentration (MIC) of a drug is the lowestconcentration that inhibits growth and is a more useful quantity but requires each sample to be tested at a range ofconcentrations for each drug. Using 96-well broth micro dilution plates with each well containing a lyophilised pre-determined amount of an antibiotic is a convenient and cost-effective way to measure the MICs of several drugs at once for a clinical sample. Although accurate, this is still an expensive and slow process that requires highly-skilled and experienced laboratory scientists. Here we show that, through the BashTheBug project hosted on the Zooniverse citizen science platform, a crowd of volunteers can reproducibly and accurately determine the MICs for 13 drugs and that simply taking the median or mode of 11–17 independent classifications is sufficient. There is therefore a potential role for crowds to support (but not supplant) the role of experts in antibiotic susceptibility testing

    Detecting changes in population trends in infection surveillance using community SARS-CoV-2 prevalence as an exemplar

    Get PDF
    Detecting and quantifying changes in growth rates of infectious diseases is vital to informing public health strategy and can inform policymakers’ rationale for implementing or continuing interventions aimed at reducing impact. Substantial changes in SARS-CoV-2 prevalence with emergence of variants provides opportunity to investigate different methods to do this. We included PCR results from all participants in the UK’s COVID-19 Infection Survey between August 2020-June 2022. Change-points for growth rates were identified using iterative sequential regression (ISR) and second derivatives of generalised additive models (GAMs). Consistency between methods and timeliness of detection were compared. Of 8,799,079 visits, 147,278 (1.7%) were PCR-positive. Change-points associated with emergence of major variants were estimated to occur a median 4 days earlier (IQR 0-8) in GAMs versus ISR. When estimating recent change-points using successive data periods, four change-points (4/96) identified by GAMs were not found when adding later data or by ISR. Change-points were detected 3-5 weeks after they occurred in both methods but could be detected earlier within specific subgroups. Change-points in growth rates of SARS-CoV-2 can be detected in near real-time using ISR and second derivatives of GAMs. To increase certainty about changes in epidemic trajectories both methods could be run in parallel

    Ct threshold values, a proxy for viral load in community SARS-CoV-2 cases, demonstrate wide variation across populations and over time.

    Get PDF
    BACKGROUND: Information on SARS-CoV-2 in representative community surveillance is limited, particularly cycle threshold (Ct) values (a proxy for viral load). METHODS: We included all positive nose and throat swabs 26 April 2020 to 13 March 2021 from the UK's national COVID-19 Infection Survey, tested by RT-PCR for the N, S, and ORF1ab genes. We investigated predictors of median Ct value using quantile regression. RESULTS: Of 3,312,159 nose and throat swabs, 27,902 (0.83%) were RT-PCR-positive, 10,317 (37%), 11,012 (40%), and 6550 (23%) for 3, 2, or 1 of the N, S, and ORF1ab genes, respectively, with median Ct = 29.2 (~215 copies/ml; IQR Ct = 21.9-32.8, 14-56,400 copies/ml). Independent predictors of lower Cts (i.e. higher viral load) included self-reported symptoms and more genes detected, with at most small effects of sex, ethnicity, and age. Single-gene positives almost invariably had Ct > 30, but Cts varied widely in triple-gene positives, including without symptoms. Population-level Cts changed over time, with declining Ct preceding increasing SARS-CoV-2 positivity. Of 6189 participants with IgG S-antibody tests post-first RT-PCR-positive, 4808 (78%) were ever antibody-positive; Cts were significantly higher in those remaining antibody negative. CONCLUSIONS: Marked variation in community SARS-CoV-2 Ct values suggests that they could be a useful epidemiological early-warning indicator. FUNDING: Department of Health and Social Care, National Institutes of Health Research, Huo Family Foundation, Medical Research Council UK; Wellcome Trust

    Genomic Epidemiology of Complex, Multispecies, Plasmid-Borne bla KPC Carbapenemase in Enterobacterales in the United Kingdom from 2009 to 2014.

    Get PDF
    Carbapenem resistance in Enterobacterales is a public health threat. Klebsiella pneumoniae carbapenemase (encoded by alleles of the bla KPC family) is one of the most common transmissible carbapenem resistance mechanisms worldwide. The dissemination of bla KPC historically has been associated with distinct K. pneumoniae lineages (clonal group 258 [CG258]), a particular plasmid family (pKpQIL), and a composite transposon (Tn4401). In the United Kingdom, bla KPC has represented a large-scale, persistent management challenge for some hospitals, particularly in North West England. The dissemination of bla KPC has evolved to be polyclonal and polyspecies, but the genetic mechanisms underpinning this evolution have not been elucidated in detail; this study used short-read whole-genome sequencing of 604 bla KPC-positive isolates (Illumina) and long-read assembly (PacBio)/polishing (Illumina) of 21 isolates for characterization. We observed the dissemination of bla KPC (predominantly bla KPC-2; 573/604 [95%] isolates) across eight species and more than 100 known sequence types. Although there was some variation at the transposon level (mostly Tn4401a, 584/604 [97%] isolates; predominantly with ATTGA-ATTGA target site duplications, 465/604 [77%] isolates), bla KPC spread appears to have been supported by highly fluid, modular exchange of larger genetic segments among plasmid populations dominated by IncFIB (580/604 isolates), IncFII (545/604 isolates), and IncR (252/604 isolates) replicons. The subset of reconstructed plasmid sequences (21 isolates, 77 plasmids) also highlighted modular exchange among non-bla KPC and bla KPC plasmids and the common presence of multiple replicons within bla KPC plasmid structures (>60%). The substantial genomic plasticity observed has important implications for our understanding of the epidemiology of transmissible carbapenem resistance in Enterobacterales for the implementation of adequate surveillance approaches and for control

    Persistence of SARS-CoV-2 antibodies over 18 months following infection: UK Biobank COVID-19 Serology Study

    Get PDF
    Background Little is known about the persistence of antibodies after the first year following SARS-CoV-2 infection. We aimed to determine the proportion of individuals that maintain detectable levels of SARS-CoV-2 antibodies over an 18-month period following infection. Methods Population-based prospective study of 20 000 UK Biobank participants and their adult relatives recruited in May 2020. The proportion of SARS-CoV-2 cases testing positive for immunoglobulin G (IgG) antibodies against the spike protein (IgG-S), and the nucleocapsid protein (IgG-N), was calculated at varying intervals following infection. Results Overall, 20 195 participants were recruited. Their median age was 56 years (IQR 39–68), 56% were female and 88% were of white ethnicity. The proportion of SARS-CoV-2 cases with IgG-S antibodies following infection remained high (92%, 95% CI 90%–93%) at 6 months after infection. Levels of IgG-N antibodies following infection gradually decreased from 92% (95% CI 88%–95%) at 3 months to 72% (95% CI 70%–75%) at 18 months. There was no strong evidence of heterogeneity in antibody persistence by age, sex, ethnicity or socioeconomic deprivation. Conclusion This study adds to the limited evidence on the long-term persistence of antibodies following SARS-CoV-2 infection, with likely implications for waning immunity following infection and the use of IgG-N in population surveys

    Antibodies to SARS-CoV-2 are associated with protection against reinfection

    Get PDF
    Background: The relationship between the presence of antibodies to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the risk of subsequent reinfection remains unclear. Methods: We investigated the incidence of SARS-CoV-2 infection confirmed by polymerase chain reaction (PCR) in seropositive and seronegative health care workers attending testing of asymptomatic and symptomatic staff at Oxford University Hospitals in the United Kingdom. Baseline antibody status was determined by anti-spike (primary analysis) and anti-nucleocapsid IgG assays, and staff members were followed for up to 31 weeks. We estimated the relative incidence of PCR-positive test results and new symptomatic infection according to antibody status, adjusting for age, participant-reported gender, and changes in incidence over time. Results: A total of 12,541 health care workers participated and had anti-spike IgG measured; 11,364 were followed up after negative antibody results and 1265 after positive results, including 88 in whom seroconversion occurred during follow-up. A total of 223 anti-spike–seronegative health care workers had a positive PCR test (1.09 per 10,000 days at risk), 100 during screening while they were asymptomatic and 123 while symptomatic, whereas 2 anti-spike–seropositive health care workers had a positive PCR test (0.13 per 10,000 days at risk), and both workers were asymptomatic when tested (adjusted incidence rate ratio, 0.11; 95% confidence interval, 0.03 to 0.44; P=0.002). There were no symptomatic infections in workers with anti-spike antibodies. Rate ratios were similar when the anti-nucleocapsid IgG assay was used alone or in combination with the anti-spike IgG assay to determine baseline status. Conclusions: The presence of anti-spike or anti-nucleocapsid IgG antibodies was associated with a substantially reduced risk of SARS-CoV-2 reinfection in the ensuing 6 months. (Funded by the U.K. Government Department of Health and Social Care and others.

    Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis.

    Get PDF
    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes

    The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: a genotypic analysis.

    Get PDF
    Background: Molecular diagnostics are considered the most promising route to achievement of rapid, universal drug susceptibility testing for Mycobacterium tuberculosis complex (MTBC). We aimed to generate a WHO-endorsed catalogue of mutations to serve as a global standard for interpreting molecular information for drug resistance prediction. Methods: In this systematic analysis, we used a candidate gene approach to identify mutations associated with resistance or consistent with susceptibility for 13 WHO-endorsed antituberculosis drugs. We collected existing worldwide MTBC whole-genome sequencing data and phenotypic data from academic groups and consortia, reference laboratories, public health organisations, and published literature. We categorised phenotypes as follows: methods and critical concentrations currently endorsed by WHO (category 1); critical concentrations previously endorsed by WHO for those methods (category 2); methods or critical concentrations not currently endorsed by WHO (category 3). For each mutation, we used a contingency table of binary phenotypes and presence or absence of the mutation to compute positive predictive value, and we used Fisher's exact tests to generate odds ratios and Benjamini-Hochberg corrected p values. Mutations were graded as associated with resistance if present in at least five isolates, if the odds ratio was more than 1 with a statistically significant corrected p value, and if the lower bound of the 95% CI on the positive predictive value for phenotypic resistance was greater than 25%. A series of expert rules were applied for final confidence grading of each mutation. Findings: We analysed 41 137 MTBC isolates with phenotypic and whole-genome sequencing data from 45 countries. 38 215 MTBC isolates passed quality control steps and were included in the final analysis. 15 667 associations were computed for 13 211 unique mutations linked to one or more drugs. 1149 (7·3%) of 15 667 mutations were classified as associated with phenotypic resistance and 107 (0·7%) were deemed consistent with susceptibility. For rifampicin, isoniazid, ethambutol, fluoroquinolones, and streptomycin, the mutations' pooled sensitivity was more than 80%. Specificity was over 95% for all drugs except ethionamide (91·4%), moxifloxacin (91·6%) and ethambutol (93·3%). Only two resistance mutations were identified for bedaquiline, delamanid, clofazimine, and linezolid as prevalence of phenotypic resistance was low for these drugs. Interpretation: We present the first WHO-endorsed catalogue of molecular targets for MTBC drug susceptibility testing, which is intended to provide a global standard for resistance interpretation. The existence of this catalogue should encourage the implementation of molecular diagnostics by national tuberculosis programmes. Funding: Unitaid, Wellcome Trust, UK Medical Research Council, and Bill and Melinda Gates Foundation
    corecore