Efficient analysis of microbial whole-genome sequence data using de Bruijn graphs

Phelim Bradley

Efficient analysis of microbial whole-genome sequence data using de Bruijn graphs

Authors: Phelim Bradley
Publication date: 1 January 2017
Publisher

Abstract

Antimicrobial resistance (AMR) is a persistent and growing threat to global health. Whole genome sequencing (WGS) has the potential to dramatically improve our ability to detect, understand, and monitor AMR. However, microbial diversity and complexity means that the analysis and interpretation of their genomes is challenging. In this thesis, I explore applications of de Bruijn graphs (DBGs) to the analysis of these data. First, I present a tool, Mykrobe predictor, that uses DBGs to rapidly identify species and AMR from WGS data. I show that it is accurate, flexible, and efficient. Next, I explore an extension of Mykrobe predictor to long read sequencing of direct clinical samples of M. tuberculosis. In doing so, I show that one could reduce the turn-around time for susceptibility testing of an M. tuberculosis isolate from 2 weeks to 12 hours. Finally, I explore the challenges of DNA search in very large collections (millions) of microbial data sets. In particular, I address the super-linear scaling of existing k-mer indexing tools and present a novel representation and implementation of a probabilistic coloured de Bruijn graph, âColoured Bloom Graph" (CBG). I demonstrate its scalability by building a CBG of all publicly accessible microbial WGS data (almost half a million samples) and use it to run millisecond searches in these data.</p

Similar works

Full text

Available Versions

Oxford University Research Archive (ORA)

Last time updated on 18/04/2020

ORA - Oxford University Research Archive

oai:ora.ox.ac.uk:uuid:b4dea8ec...

Last time updated on 13/04/2022

Supporting member

Oxford University Research Archive

oai:ora.ox.ac.uk:uuid:b4dea8ec...

Last time updated on 25/11/2020