    Performance of neural network basecalling tools for Oxford Nanopore sequencing.

    BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. RESULTS: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. CONCLUSIONS: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species

    Deepbinner: Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks.

    Multiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This 'signal-space' approach allows for greater accuracy than existing 'base-space' tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner

    The inflated mitochondrial genomes of siphonous green algae reflect processes driving expansion of noncoding DNA and proliferation of introns.

    Within the siphonous green algal order Bryopsidales, the size and gene arrangement of chloroplast genomes has been examined extensively, while mitochondrial genomes have been mostly overlooked. The recently published mitochondrial genome of Caulerpa lentillifera is large with expanded noncoding DNA, but it remains unclear if this is characteristic of the entire order. Our study aims to evaluate the evolutionary forces shaping organelle genome dynamics in the Bryopsidales based on the C. lentillifera and Ostreobium quekettii mitochondrial genomes. In this study, the mitochondrial genome of O. quekettii was characterised using a combination of long and short read sequencing, and bioinformatic tools for annotation and sequence analyses. We compared the mitochondrial and chloroplast genomes of O. quekettii and C. lentillifera to examine hypotheses related to genome evolution. The O. quekettii mitochondrial genome is the largest green algal mitochondrial genome sequenced (241,739 bp), considerably larger than its chloroplast genome. As with the mtDNA of C. lentillifera, most of this excess size is from the expansion of intergenic DNA and proliferation of introns. Inflated mitochondrial genomes in the Bryopsidales suggest effective population size, recombination and/or mutation rate, influenced by nuclear-encoded proteins, differ between the genomes of mitochondria and chloroplasts, reducing the strength of selection to influence evolution of their mitochondrial genomes

    Tracking key virulence loci encoding aerobactin and salmochelin siderophore synthesis in Klebsiella pneumoniae.

    BACKGROUND: Klebsiella pneumoniae is a recognised agent of multidrug-resistant (MDR) healthcare-associated infections; however, individual strains vary in their virulence potential due to the presence of mobile accessory genes. In particular, gene clusters encoding the biosynthesis of siderophores aerobactin (iuc) and salmochelin (iro) are associated with invasive disease and are common amongst hypervirulent K. pneumoniae clones that cause severe community-associated infections such as liver abscess and pneumonia. Concerningly, iuc has also been reported in MDR strains in the hospital setting, where it was associated with increased mortality, highlighting the need to understand, detect and track the mobility of these virulence loci in the K. pneumoniae population. METHODS: Here, we examined the genetic diversity, distribution and mobilisation of iuc and iro loci amongst 2503 K. pneumoniae genomes using comparative genomics approaches and developed tools for tracking them via genomic surveillance. RESULTS: Iro and iuc were detected at low prevalence (< 10%). Considerable genetic diversity was observed, resolving into five iro and six iuc lineages that show distinct patterns of mobilisation and dissemination in the K. pneumoniae population. The major burden of iuc and iro amongst the genomes analysed was due to two linked lineages (iuc1/iro1 74% and iuc2/iro2 14%), each carried by a distinct non-self-transmissible IncFIBK virulence plasmid type that we designate KpVP-1 and KpVP-2. These dominant types also carry hypermucoidy (rmpA) determinants and include all previously described virulence plasmids of K. pneumoniae. The other iuc and iro lineages were associated with diverse plasmids, including some carrying IncFII conjugative transfer regions and some imported from Escherichia coli; the exceptions were iro3 (mobilised by ICEKp1) and iuc4 (fixed in the chromosome of K. pneumoniae subspecies rhinoscleromatis). Iro/iuc mobile genetic elements (MGEs) appear to be stably maintained at high frequency within known hypervirulent strains (ST23, ST86, etc.) but were also detected at low prevalence in others such as MDR strain ST258. CONCLUSIONS: Iuc and iro are mobilised in K. pneumoniae via a limited number of MGEs. This study provides a framework for identifying and tracking these important virulence loci, which will be important for genomic surveillance efforts including monitoring for the emergence of hypervirulent MDR K. pneumoniae strains

    Completing bacterial genome assemblies with multiplex MinION sequencing

    AbstractIllumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.Data summarySequence read files for all 12 isolates have been deposited in SRA, accessible through these NCBI BioSample accession numbers: SAMEA3357010, SAMEA3357043, SAMN07211279, SAMN07211280, SAMEA3357223, SAMEA3357193, SAMEA3357346, SAMEA3357374, SAMEA3357320, SAMN07211281, SAMN07211282, SAMEA3357405.A full list of SRA run accession numbers (both Illumina reads and ONT reads) for these samples are available in Table S1.Assemblies and sequencing reads corresponding to each stage of processing and analysis are provided in the following figshare project: https://figshare.com/projects/Completing_bacterial_genome_assemblies_with_multiplex_MinION_sequencing/23068Source code is provided in the following public GitHub repositories: https://github.com/rrwick/Bacterial-genome-assemblies-with-multiplex-MinION-sequencinghttps://github.com/rrwick/Porechophttps://github.com/rrwick/Fast5-to-FastqImpact StatementLike many research and public health laboratories, we frequently perform large-scale bacterial comparative genomics studies using Illumina sequencing, which assays gene content and provides the high-confidence variant calls needed for phylogenomics and transmission studies. However, problems often arise with resolving genome assemblies, particularly around regions that matter most to our research, such as mobile genetic elements encoding antibiotic resistance or virulence genes. These complexities can often be resolved by long sequence reads generated with PacBio or Oxford Nanopore Technologies (ONT) platforms. While effective, this has proven difficult to scale, due to the relatively high costs of generating long reads and the manual intervention required for assembly. Here we demonstrate the use of barcoded ONT libraries sequenced in multiplex on a single ONT MinION flow cell, coupled with hybrid assembly using Unicycler, to resolve 12 large bacterial genomes. Minor manual intervention was required to fully resolve small plasmids in five isolates, which we found to be underrepresented in ONT data. Cost per sample for the ONT sequencing was equivalent to Illumina sequencing, and there is potential for significant savings by multiplexing more samples on the ONT run. This approach paves the way for high-throughput and cost-effective generation of completely resolved bacterial genomes to become widely accessible.</jats:sec

    Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

    The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler

    Helicobacter pylori Infection Promotes Methylation and Silencing of Trefoil Factor 2, Leading to Gastric Tumor Development in Mice and Humans

    Background & Aims Trefoil factors (TFFs) regulate mucosal repair and suppress tumor formation in the stomach. Tff1 deficiency results in gastric cancer, whereas Tff2 deficiency increases gastric inflammation. TFF2 expression is frequently lost in gastric neoplasms, but the nature of the silencing mechanism and associated impact on tumorigenesis have not been determined. Methods We investigated the epigenetic silencing of TFF2 in gastric biopsy specimens from individuals with Helicobacter pylori-positive gastritis, intestinal metaplasia, gastric cancer, and disease-free controls. TFF2 function and methylation were manipulated in gastric cancer cell lines. The effects of Tff2 deficiency on tumor growth were investigated in the gp130[superscript F/F] mouse model of gastric cancer. Results In human tissue samples, DNA methylation at the TFF2 promoter began at the time of H pylori infection and increased throughout gastric tumor progression. TFF2 methylation levels were inversely correlated with TFF2 messenger RNA levels and could be used to discriminate between disease-free controls, H pylori-infected, and tumor tissues. Genome demethylation restored TFF2 expression in gastric cancer cell lines, so TFF2 silencing requires methylation. In Tff2-deficient gp130[superscript F/F]/Tff2[superscript −/−] mice, proliferation of mucosal cells and release of T helper cell type-1 (Th-1) 1 cytokines increased, whereas expression of gastric tumor suppressor genes and Th-2 cytokines were reduced, compared with gp130[superscript F/F]controls. The fundus of gp130[superscript F/F]/Tff2[superscript −/−] mice displayed glandular atrophy and metaplasia, indicating accelerated preneoplasia. Experimental H pylori infection in wild-type mice reduced antral expression of Tff2 by increased promoter methylation. Conclusions TFF2 negatively regulates preneoplastic progression and subsequent tumor development in the stomach, a role that is subverted by promoter methylation during H pylori infection.National Health and Medical Research Council (Australia

    Genomic Diversity and Antimicrobial Resistance of Haemophilus Colonizing the Airways of Young Children with Cystic Fibrosis

    Respiratory infection during childhood is a key risk factor in early cystic fibrosis (CF) lung disease progression. Haemophilus influenzae and Haemophilus parainfluenzae are routinely isolated from the lungs of children with CF; however, little is known about the frequency and characteristics of Haemophilus colonization in this context. Here, we describe the detection, antimicrobial resistance (AMR), and genome sequencing of H. influenzae and H. parainfluenzae isolated from airway samples of 147 participants aged ≤12 years enrolled in the Australian Respiratory Early Surveillance Team for Cystic Fibrosis (AREST CF) program, Melbourne, Australia. The frequency of colonization per visit was 4.6% for H. influenzae and 32.1% for H. parainfluenzae, 80.3% of participants had H. influenzae and/or H. parainfluenzae detected on at least one visit, and using genomic data, we estimate 15.6% of participants had persistent colonization with the same strain for at least two consecutive visits. Isolates were genetically diverse and AMR was common, with 52% of H. influenzae and 82% of H. parainfluenzae displaying resistance to at least one drug. The genetic basis for AMR could be identified in most cases; putative novel determinants include a new plasmid encoding blaTEM-1 (ampicillin resistance), a new inhibitor-resistant blaTEM allele (augmentin resistance), and previously unreported mutations in chromosomally carried genes (pbp3, ampicillin resistance; folA/folP, cotrimoxazole resistance; rpoB, rifampicin resistance). Acquired AMR genes were more common in H. parainfluenzae than H. influenzae (51% versus 21%, P = 0.0107) and were mostly associated with the ICEHin mobile element carrying blaTEM-1, resulting in more ampicillin resistance in H. parainfluenzae (73% versus 30%, P = 0.0004). Genomic data identified six potential instances of Haemophilus transmission between participants, of which three involved participants who shared clinic visit days. IMPORTANCE Cystic fibrosis (CF) lung disease begins during infancy, and acute respiratory infections increase the risk of early disease development and progression. Microbes involved in advanced stages of CF are well characterized, but less is known about early respiratory colonizers. We report the population dynamics and genomic determinants of AMR in two early colonizer species, namely, Haemophilus influenzae and Haemophilus parainfluenzae, collected from a pediatric CF cohort. This investigation also reveals that H. parainfluenzae has a high frequency of AMR carried on mobile elements that may act as a potential reservoir for the emergence and spread of AMR to H. influenzae, which has greater clinical significance as a respiratory pathogen in children. This study provides insight into the evolution of AMR and the colonization of H. influenzae and H. parainfluenzae in a pediatric CF cohort, which will help inform future treatment

    Emergence and rapid global dissemination of CTX-M-15-associated Klebsiella pneumoniae strain ST307

    Objectives: Recent reports indicate the emergence of a new carbapenemase-producing Klebsiella pneumoniae clone, ST307. We sought to better understand the global epidemiology and evolution of this clone and evaluate its association with antimicrobial resistance (AMR) genes. Methods: We collated information from the literature and public databases and performed a comparative analysis of 95 ST307 genomes (including 37 that were newly sequenced). Results: We show that ST307 emerged in the mid-1990s (nearly 20 years prior to its first report), is already globally distributed and is intimately associated with a conserved plasmid harbouring the blaCTX-M-15 ESBL gene and several other AMR determinants. Conclusions: Our findings support the need for enhanced surveillance of this widespread ESBL clone in which carbapenem resistance has occasionally emerged.publishedVersio
