283 research outputs found

    Benchmarking of long-read assemblers for prokaryote whole genome sequencing.

    Get PDF
    Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms

    Polypolish: Short-read polishing of long-read bacterial genome assemblies

    Get PDF
    Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers

    Identification of Klebsiella capsule synthesis loci from whole genome data.

    Get PDF
    Klebsiella pneumoniae is a growing cause of healthcare-associated infections for which multi-drug resistance is a concern. Its polysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible and many isolates are serologically non-typeable. Molecular typing techniques provide useful insights, but existing methods fail to take full advantage of the information in whole genome sequences. We investigated the diversity of the capsule synthesis loci (K-loci) among 2503 K. pneumoniae genomes. We incorporated analyses of full-length K-locus nucleotide sequences and also clustered protein-encoding sequences to identify, annotate and compare K-locus structures. We propose a standardized nomenclature for K-loci and present a curated reference database. A total of 134 distinct K-loci were identified, including 31 novel types. Comparative analyses indicated 508 unique protein-encoding gene clusters that appear to reassort via homologous recombination. Extensive intra- and inter-locus nucleotide diversity was detected among the wzi and wzc genes, indicating that current molecular typing schemes based on these genes are inadequate. As a solution, we introduce Kaptive, a novel software tool that automates the process of identifying K-loci based on full locus information extracted from whole genome sequences (https://github.com/katholt/Kaptive). This work highlights the extensive diversity of Klebsiella K-loci and the proteins that they encode. The nomenclature, reference database and novel typing method presented here will become essential resources for genomic surveillance and epidemiological investigations of this pathogen

    Deepbinner: Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks.

    Get PDF
    Multiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This 'signal-space' approach allows for greater accuracy than existing 'base-space' tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner

    Performance of neural network basecalling tools for Oxford Nanopore sequencing.

    Get PDF
    BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. RESULTS: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. CONCLUSIONS: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species

    Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks

    Get PDF
    AbstractMultiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This ‘signal-space’ approach allows for greater accuracy than existing ‘base-space’ tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner.</jats:p

    The inflated mitochondrial genomes of siphonous green algae reflect processes driving expansion of noncoding DNA and proliferation of introns.

    Get PDF
    Within the siphonous green algal order Bryopsidales, the size and gene arrangement of chloroplast genomes has been examined extensively, while mitochondrial genomes have been mostly overlooked. The recently published mitochondrial genome of Caulerpa lentillifera is large with expanded noncoding DNA, but it remains unclear if this is characteristic of the entire order. Our study aims to evaluate the evolutionary forces shaping organelle genome dynamics in the Bryopsidales based on the C. lentillifera and Ostreobium quekettii mitochondrial genomes. In this study, the mitochondrial genome of O. quekettii was characterised using a combination of long and short read sequencing, and bioinformatic tools for annotation and sequence analyses. We compared the mitochondrial and chloroplast genomes of O. quekettii and C. lentillifera to examine hypotheses related to genome evolution. The O. quekettii mitochondrial genome is the largest green algal mitochondrial genome sequenced (241,739 bp), considerably larger than its chloroplast genome. As with the mtDNA of C. lentillifera, most of this excess size is from the expansion of intergenic DNA and proliferation of introns. Inflated mitochondrial genomes in the Bryopsidales suggest effective population size, recombination and/or mutation rate, influenced by nuclear-encoded proteins, differ between the genomes of mitochondria and chloroplasts, reducing the strength of selection to influence evolution of their mitochondrial genomes

    Tracking key virulence loci encoding aerobactin and salmochelin siderophore synthesis in Klebsiella pneumoniae.

    Get PDF
    BACKGROUND: Klebsiella pneumoniae is a recognised agent of multidrug-resistant (MDR) healthcare-associated infections; however, individual strains vary in their virulence potential due to the presence of mobile accessory genes. In particular, gene clusters encoding the biosynthesis of siderophores aerobactin (iuc) and salmochelin (iro) are associated with invasive disease and are common amongst hypervirulent K. pneumoniae clones that cause severe community-associated infections such as liver abscess and pneumonia. Concerningly, iuc has also been reported in MDR strains in the hospital setting, where it was associated with increased mortality, highlighting the need to understand, detect and track the mobility of these virulence loci in the K. pneumoniae population. METHODS: Here, we examined the genetic diversity, distribution and mobilisation of iuc and iro loci amongst 2503 K. pneumoniae genomes using comparative genomics approaches and developed tools for tracking them via genomic surveillance. RESULTS: Iro and iuc were detected at low prevalence (< 10%). Considerable genetic diversity was observed, resolving into five iro and six iuc lineages that show distinct patterns of mobilisation and dissemination in the K. pneumoniae population. The major burden of iuc and iro amongst the genomes analysed was due to two linked lineages (iuc1/iro1 74% and iuc2/iro2 14%), each carried by a distinct non-self-transmissible IncFIBK virulence plasmid type that we designate KpVP-1 and KpVP-2. These dominant types also carry hypermucoidy (rmpA) determinants and include all previously described virulence plasmids of K. pneumoniae. The other iuc and iro lineages were associated with diverse plasmids, including some carrying IncFII conjugative transfer regions and some imported from Escherichia coli; the exceptions were iro3 (mobilised by ICEKp1) and iuc4 (fixed in the chromosome of K. pneumoniae subspecies rhinoscleromatis). Iro/iuc mobile genetic elements (MGEs) appear to be stably maintained at high frequency within known hypervirulent strains (ST23, ST86, etc.) but were also detected at low prevalence in others such as MDR strain ST258. CONCLUSIONS: Iuc and iro are mobilised in K. pneumoniae via a limited number of MGEs. This study provides a framework for identifying and tracking these important virulence loci, which will be important for genomic surveillance efforts including monitoring for the emergence of hypervirulent MDR K. pneumoniae strains

    Kaptive Web: User-Friendly Capsule and Lipopolysaccharide Serotype Prediction for Klebsiella Genomes.

    Get PDF
    As whole-genome sequencing becomes an established component of the microbiologist's toolbox, it is imperative that researchers, clinical microbiologists, and public health professionals have access to genomic analysis tools for the rapid extraction of epidemiologically and clinically relevant information. For the Gram-negative hospital pathogens such as Klebsiella pneumoniae, initial efforts have focused on the detection and surveillance of antimicrobial resistance genes and clones. However, with the resurgence of interest in alternative infection control strategies targeting Klebsiella surface polysaccharides, the ability to extract information about these antigens is increasingly important. Here we present Kaptive Web, an online tool for the rapid typing of Klebsiella K and O loci, which encode the polysaccharide capsule and lipopolysaccharide O antigen, respectively. Kaptive Web enables users to upload and analyze genome assemblies in a web browser. The results can be downloaded in tabular format or explored in detail via the graphical interface, making it accessible for users at all levels of computational expertise. We demonstrate Kaptive Web's utility by analyzing >500 K. pneumoniae genomes. We identify extensive K and O locus diversity among 201 genomes belonging to the carbapenemase-associated clonal group 258 (25 K and 6 O loci). The characterization of a further 309 genomes indicated that such diversity is common among the multidrug-resistant clones and that these loci represent useful epidemiological markers for strain subtyping. These findings reinforce the need for rapid, reliable, and accessible typing methods such as Kaptive Web. Kaptive Web is available for use at http://kaptive.holtlab.net/, and the source code is available at https://github.com/kelwyres/Kaptive-Web

    Bandage: interactive visualization of de novo genome assemblies.

    Get PDF
    UNLABELLED: Although de novo assembly graphs contain assembled contigs (nodes), the connections between those contigs (edges) are difficult for users to access. Bandage (a Bioinformatics Application for Navigating De novo Assembly Graphs Easily) is a tool for visualizing assembly graphs with connections. Users can zoom in to specific areas of the graph and interact with it by moving nodes, adding labels, changing colors and extracting sequences. BLAST searches can be performed within the Bandage graphical user interface and the hits are displayed as highlights in the graph. By displaying connections between contigs, Bandage presents new possibilities for analyzing de novo assemblies that are not possible through investigation of contigs alone. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are freely available at https://github.com/rrwick/Bandage. Bandage is implemented in C++ and supported on Linux, OS X and Windows. A full feature list and screenshots are available at http://rrwick.github.io/Bandage. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
    • …
    corecore