Using bacterial DNA sequencing data to investigate the epidemiology of plasmid-mediated antibiotic resistance

Abstract

Bacterial plasmids are extra-chromosomal genetic elements, which can act as efficient vectors of antibiotic resistance. Epidemiological insight into plasmids may be gained by applying plasmid typing schemes, which exploit loci involved in replication and mobility functions (replicon and MOB typing, respectively). In Chapter 2, I compiled a curated dataset of complete NCBI plasmids to assess the performance of in silico replicon and MOB typing in terms of concordance and ‘typeability’ (proportion of plasmids typed). I found a degree of non-concordance between the schemes, which was attributed to either ambiguous boundaries between MOBP/MOBQ types, or the mosaic nature of some plasmid genomes. Ultimately, I showed that the schemes fail to accommodate the diversity of plasmid genomes; of ~14000 curated bacterial plasmids, only 42% and 55% could be assigned a replicon and MOB type, respectively. Given the limitations of plasmid typing, I subsequently focused on whole genome sequencing (WGS) analysis approaches capitalising on the wider plasmid genome. High-throughput DNA sequencing has produced 1000s of bacterial WGS datasets. However, such datasets commonly comprise short sequencing reads, which yield fragmented assemblies; this makes comparative analysis of plasmid genomes challenging. In Chapter 3, I developed two methods for comparative plasmid analysis, which cluster short-read sequenced samples according to 1) plasmid replicon types; 2) sample-vs-reference plasmid distance score profiles. However, benchmarking suggested neither method is completely reliable. The rise of long-read sequencing technology has increased the availability of complete plasmid assemblies, facilitating comparative plasmid genomic analyses. Nevertheless, available alignment-based comparative genomic tools have limitations: they often do not provide metrics on structural similarity and lack flexibility in terms of input/output options. Therefore, in Chapter 4, I developed a novel alignment-based tool (‘ATCG’) for calculating pairwise average nucleotide identity (ANI), coverage breadth, and structural similarity, while addressing limitations of existing alignment-based tools. Benchmarking demonstrated favourable runtimes and supported the validity of calculated ANI scores. In Chapter 5, besides curating an updated plasmid dataset, I curated sample metadata (e.g. isolation source, geography). Using this metadata and plasmid biological features, I conducted multivariate statistical analyses to determine factors associated with plasmid resistance gene carriage, analysed across major resistance gene classes. The analysis yielded interesting findings, for example, demonstrating that patterns of plasmid antibiotic resistance carriage in livestock and humans reflect known antibiotic usage

    Similar works