Search CORE

2,011 research outputs found

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

Author: Cherukuri Yesesri
Janga Sarath Chandra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION® sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions

Author: Alkan Can
Cali Damla Senol
Ghose Saugata
Kim Jeremie S.
Mutlu Onur
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages, and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we 1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and 2) provide guidelines for determining the appropriate tools for each step. We analyze various combinations of different tools and expose the tradeoffs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, in order to overcome the high error rates of the nanopore sequencing technology.Comment: To appear in Briefings in Bioinformatics (BIB), 201

arXiv.org e-Print Archive

Crossref

Bilkent University Institutional Repository

Recommended from our members

NAD tagSeq reveals that NAD+-capped RNAs are mostly produced from a large number of protein-coding genes in Arabidopsis.

Author: Cai Zongwei
Chen Xuemei
Ni Min
Shao Xiaojian
Xia Yiji
Zhang Hailei
Zhang Shoudong
Zhong Huan
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

The 5' end of a eukaryotic mRNA transcript generally has a 7-methylguanosine (m7G) cap that protects mRNA from degradation and mediates almost all other aspects of gene expression. Some RNAs in Escherichia coli, yeast, and mammals were recently found to contain an NAD+ cap. Here, we report the development of the method NAD tagSeq for transcriptome-wide identification and quantification of NAD+-capped RNAs (NAD-RNAs). The method uses an enzymatic reaction and then a click chemistry reaction to label NAD-RNAs with a synthetic RNA tag. The tagged RNA molecules can be enriched and directly sequenced using the Oxford Nanopore sequencing technology. NAD tagSeq can allow more accurate identification and quantification of NAD-RNAs, as well as reveal the sequences of whole NAD-RNA transcripts using single-molecule RNA sequencing. Using NAD tagSeq, we found that NAD-RNAs in Arabidopsis were produced by at least several thousand genes, most of which are protein-coding genes, with the majority of these transcripts coming from <200 genes. For some Arabidopsis genes, over 5% of their transcripts were NAD capped. Gene ontology terms overrepresented in the 2,000 genes that produced the highest numbers of NAD-RNAs are related to photosynthesis, protein synthesis, and responses to cytokinin and stresses. The NAD-RNAs in Arabidopsis generally have the same overall sequence structures as the canonical m7G-capped mRNAs, although most of them appear to have a shorter 5' untranslated region (5' UTR). The identification and quantification of NAD-RNAs and revelation of their sequence features can provide essential steps toward understanding the functions of NAD-RNAs

eScholarship - University of California

Forensic tri-allelic SNP genotyping using nanopore sequencing

Author: Cornelis Senne
Deforce Dieter
Gansemans Yannick
Van Nieuwerburgh Filip
Vander Plaetsen Ann-Sophie
Weymaere Jana
Willems Sander
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The potential and current state-of-the-art of forensic SNP genotyping using nanopore sequencing was investigated with a panel of 16 tri-allelic single nucleotide polymorphisms (SNPs), multiplexing five samples per sequencing run. The sample set consisted of three single-source human genomic reference control DNA samples and two GEDNAP samples, simulating casework samples. The primers for the multiplex SNP-loci PCR were taken from a study which researched their value in a forensic setting using conventional single-base extension technology. Workflows for multiplexed Oxford Nanopore Technologies 1D and 1D(2) sequencing were developed that provide correct genotyping of most SNP loci. Loci that are problematic for nanopore sequencing were characterized. When such loci are avoided, nanopore sequencing of forensic tri-allelic SNPs is technically feasible

Ghent University Academic Bibliography

Models and information-theoretic bounds for nanopore sequencing

Author: Diggavi Suhas
Kannan Sreeram
Mao Wei
Publication venue
Publication date: 17/02/2018
Field of study

Nanopore sequencing is an emerging new technology for sequencing DNA, which can read long fragments of DNA (~50,000 bases) in contrast to most current short-read sequencing technologies which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore and current variations are measured. The DNA sequence is inferred from this observed current pattern using an algorithm called a base-caller. In this paper, we propose a mathematical model for the "channel" from the input DNA sequence to the observed current, and calculate bounds on the information extraction capacity of the nanopore sequencer. This model incorporates impairments like (non-linear) inter-symbol interference, deletions, as well as random response. These information bounds have two-fold application: (1) The decoding rate with a uniform input distribution can be used to calculate the average size of the plausible list of DNA sequences given an observed current trace. This bound can be used to benchmark existing base-calling algorithms, as well as serving a performance objective to design better nanopores. (2) When the nanopore sequencer is used as a reader in a DNA storage system, the storage capacity is quantified by our bounds

arXiv.org e-Print Archive

Crossref

Identification of Structural Variation in Chimpanzees Using Optical Mapping and Nanopore Sequencing.

Author: Andrés Aida M
Dennis Megan Y
Kaya Gulhan
Mastoras Mira
Sahasrabudhe Ruta
Schmidt Joshua M
Shew Colin
Soto Daniela C
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Recent efforts to comprehensively characterize great ape genetic diversity using short-read sequencing and single-nucleotide variants have led to important discoveries related to selection within species, demographic history, and lineage-specific traits. Structural variants (SVs), including deletions and inversions, comprise a larger proportion of genetic differences between and within species, making them an important yet understudied source of trait divergence. Here, we used a combination of long-read and -range sequencing approaches to characterize the structural variant landscape of two additional Pan troglodytes verus individuals, one of whom carries 13% admixture from Pan troglodytes troglodytes. We performed optical mapping of both individuals followed by nanopore sequencing of one individual. Filtering for larger variants (>10 kbp) and combined with genotyping of SVs using short-read data from the Great Ape Genome Project, we identified 425 deletions and 59 inversions, of which 88 and 36, respectively, were novel. Compared with gene expression in humans, we found a significant enrichment of chimpanzee genes with differential expression in lymphoblastoid cell lines and induced pluripotent stem cells, both within deletions and near inversion breakpoints. We examined chromatin-conformation maps from human and chimpanzee using these same cell types and observed alterations in genomic interactions at SV breakpoints. Finally, we focused on 56 genes impacted by SVs in >90% of chimpanzees and absent in humans and gorillas, which may contribute to chimpanzee-specific features. Sequencing a greater set of individuals from diverse subspecies will be critical to establish the complete landscape of genetic variation in chimpanzees

UCL Discovery

eScholarship - University of California

Single-molecule DNA sequencing technologies for future genomics research

Author: Gupta Pushpendra K.
Publication venue: 'Elsevier BV'
Publication date: 01/11/2008
Field of study

During the current genomics revolution, the genomes of a large number of living organisms have been fully sequenced. However, with the advent of new sequencing technologies, genomics research is now at the threshold of a second revolution. Several second-generation sequencing platforms became available in 2007, but a further revolution in DNA resequencing technologies is being witnessed in 2008, with the launch of the first single-molecule DNA sequencer (Helicos Biosciences), which has already been used to resequence the genome of the M13 virus. This review discusses several single-molecule sequencing technologies that are expected to become available during the next few years and explains how they might impact on genomics research