3,005 research outputs found

    MinION Analysis and Reference Consortium: Phase 1 data release and analysis

    Get PDF
    The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance

    Detailed evaluation of data analysis tools for subtyping of bacterial isolates based on whole genome sequencing : Neisseria meningitidis as a proof of concept

    Get PDF
    Whole genome sequencing is increasingly recognized as the most informative approach for characterization of bacterial isolates. Success of the routine use of this technology in public health laboratories depends on the availability of well-characterized and verified data analysis methods. However, multiple subtyping workflows are now often being used for a single organism, and differences between them are not always well described. Moreover, methodologies for comparison of subtyping workflows, and assessment of their performance are only beginning to emerge. Current work focuses on the detailed comparison of WGS-based subtyping workflows and evaluation of their suitability for the organism and the research context in question. We evaluated the performance of pipelines used for subtyping of Neisseria meningitidis, including the currently widely applied cgMLST approach and different SNP-based methods. In addition, the impact of the use of different tools for detection and filtering of recombinant regions and of different reference genomes were tested. Our benchmarking analysis included both assessment of technical performance of the pipelines and functional comparison of the generated genetic distance matrices and phylogenetic trees. It was carried out using replicate sequencing datasets of high- and low-coverage, consisting mainly of isolates belonging to the clonal complex 269. We demonstrated that cgMLST and some of the SNP-based subtyping workflows showed very good performance characteristics and highly similar genetic distance matrices and phylogenetic trees with isolates belonging to the same clonal complex. However, only two of the tested workflows demonstrated reproducible results for a group of more closely related isolates. Additionally, results of the SNP-based subtyping workflows were to some level dependent on the reference genome used. Interestingly, the use of recombination-filtering software generally reduced the similarity between the gene-by-gene and SNP-based methodologies for subtyping of N. meningitidis. Our study, where N. meningitidis was taken as an example, clearly highlights the need for more benchmarking comparative studies to eventually contribute to a justified use of a specific WGS data analysis workflow within an international public health laboratory context

    Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification

    Get PDF
    Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 30 end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode

    Circlator: automated circularization of genome assemblies using long sequencing reads

    Get PDF
    The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/

    Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing

    Get PDF
    The chloroplast genome harbors plenty of valuable information for phylogenetic research. Illumina short-read data is generally used for de novo assembly of whole plastomes. PacBio or Oxford Nanopore long reads are additionally employed in hybrid approaches to enable assembly across the highly similar inverted repeats of a chloroplast genome. Unlike for PacBio, plastome assemblies based solely on Nanopore reads are rarely found, due to their high error rate and non-random error profile. However, the actual quality decline connected to their use has rarely been quantified. Furthermore, no study has employed reference-based assembly using Nanopore reads, which is common with Illumina data. Using Leucanthemum Mill. as an example, we compared the sequence quality of seven chloroplast genome assemblies of the same species, using combinations of two sequencing platforms and three analysis pipelines. In addition, we assessed the factors which might influence Nanopore assembly quality during sequence generation and bioinformatic processing. The consensus sequence derived from de novo assembly of Nanopore data had a sequence identity of 99.59% compared to Illumina short-read de novo assembly. Most of the errors detected were indels (81.5%), and a large majority of them is part of homopolymer regions. The quality of reference-based assembly is heavily dependent upon the choice of a close-enough reference. When using a reference with 0.83% sequence divergence from the studied species, mapping of Nanopore reads results in a consensus comparable to that from Nanopore de novo assembly, and of only slightly inferior quality compared to a reference-based assembly with Illumina data. For optimal de novo assembly of Nanopore data, appropriate filtering of contaminants and chimeric sequences, as well as employing moderate read coverage, is essential. Based on these results, we conclude that Nanopore long reads are a suitable alternative to Illumina short reads in plastome phylogenomics. Few errors remain in the finalized assembly, which can be easily masked in phylogenetic analyses without loss in analytical accuracy. The easily applicable and cost-effective technology might warrant more attention by researchers dealing with plant chloroplast genomes

    Review of state-of-the-art algorithms for genomics data analysis pipelines

    Get PDF
    [EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in terms of data processing for clinical use. The complexity of detecting and interpreting genetic variants, coupled with the vast array of tools and algorithms and the heavy computational workload, has made the development of comprehensive genomic analysis platforms crucial to enabling clinicians to quickly provide patients with genetic results. This chapter reviews and describes the pipeline for analyzing massive genomic data using both short-read and long-read technologies, discussing the current state of the main tools used at each stage and the role of artificial intelligence in their development. It also introduces DeepNGS (deepngs.eu), an end-to-end genomic analysis web platform, including its key features and applications

    Development of novel bioinformatic pipelines for MinION-based DNA barcoding

    Get PDF
    DNA-barcoding is the process of taxonomic identification based on the sequence of a marker gene. When complex samples are analysed, we refer in particular to meta-barcoding. Barcoding has traditionally been performed with Sanger sequencing platform. The emergence of second-generation sequencing platforms, mainly represented by Illumina, enabled the high-throughput sequencing of hundreds of samples, and allowed the characterization of complex samples through meta-barcoding experiments. However, fragments sequenced with the Illumina platform are shorter than 600 bp, and this greatly limits taxonomic resolution of closely related species. Moreover, both these platforms suffer of long turnaround time, since they require shipping the samples to a sequencing facility, and complex regulations may hamper the export of material out of the country of origin. More recently, Oxford Nanopore Technologies provided the MinION, a portable and cheap third-generation sequencer, which has the potential of overcoming issues of currently available platforms, thanks to the production of long sequencing reads. However, MinION reads suffer of high error rate, therefore suitable analysis pipelines are needed to overcome this issue. In this thesis I describe the development of bioinformatic pipelines for MinION-based DNA barcoding. Starting from the analysis of single samples, I show how improvements both in sequencing chemistry and in software now allow obtaining consensus sequences directly in the field, with accuracy comparable with Sanger. Conversely, when analysing complex samples, sequencing reads cannot be collapsed for reducing the error rate. However, bioinformatic approaches exploiting increased read length largely compensate the higher error rate, resulting in high correlation between MinION and Illumina up to genus level, and a more marked sensitivity of MinION platform to detect spiked-in indicator species. In conclusion, the results presented in this thesis show that bioinformatic pipelines for the analysis of MinION reads can largely mitigate platform issues, paving the way for this platform to become the gold-standard for barcoding in the near future
    • …
    corecore