Clinical metagenomic sequencing, the analysis of the total genetic material from patient samples, is becoming increasingly widely used for the diagnosis and surveillance of infectious diseases. Its ability to detect any microbe makes it suitable for the detection of unexpected and novel pathogens, as well as providing clinically relevant genome sequence data. Bioinformatics analysis of metagenomics data remains complex, with a wide variety of methods currently in use clinically and challenges in distinguishing true pathogens from contaminants.
I used metagenomics to investigate the mechanism of the 2022 outbreak of hepatitis of unknown origin in children. Following the identification of adeno-associated virus 2 (AAV2) in samples from outbreak patients, I used long-read metagenomic sequencing to identify complex concatemeric structures in the AAV2 genome, possibly consistent with replication by both adenoviruses and herpesviruses. I then used similar methods to investigate hepatitis resulting from gene therapy with adeno-associated virus vectors, identifying elements of the gene therapy manufacturing plasmids within patient liver and complex structures in the vector genome.
Our investigations into AAV-related hepatitis suggested potential areas of development for our metagenomics protocols. We compared our existing method with protocols using Oxford Nanopore Technologies (ONT) sequencing to decrease turnaround times and costs and hybridization-capture approaches to improve sensitivity for viruses. I evaluated a range of tools for analysis of these datasets and developed an automated method that reduced false positive identifications across multiple tools. I then co-developed and tested a novel method of hybridization capture with ONT sequencing. This method allowed detection of high-copy number viruses within an hour of sequencing and improved genome coverage compared to untargeted approaches