Search CORE

5,496 research outputs found

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Author: Lee Byunghan
Moon Taesup
Weissman Tsachy
Yoon Sungroh
Publication venue
Publication date: 01/01/2017
Field of study

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

NanoAmpli-Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform

Author: Calus Szymon T.
Ijaz Umer Zeeshan
Pinto Ameet
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/12/2018
Field of study

Background: Amplicon sequencing on Illumina sequencing platforms leverages their deep sequencing and multiplexing capacity but is limited in genetic resolution due to short read lengths. While Oxford Nanopore or Pacific Biosciences sequencing platforms overcome this limitation, their application has been limited due to higher error rates or lower data output. Results: In this study, we introduce an amplicon sequencing workflow, i.e., NanoAmpli-Seq, that builds on the intramolecular-ligated nanopore consensus sequencing (INC-Seq) approach and demonstrate its application for full-length 16S rRNA gene sequencing. NanoAmpli-Seq includes vital improvements to the INC-Seq protocol that reduces sample processing time while significantly improving sequence accuracy. The developed protocol includes chopSeq software for fragmentation and read orientation correction of INC-Seq consensus reads while nanoClust algorithm was designed for read partitioning-based de novo clustering and within cluster consensus calling to obtain accurate full-length 16S rRNA gene sequences. Conclusions: NanoAmpli-Seq accurately estimates the diversity of tested mock communities with average consensus sequence accuracy of 99.5% for 2D and 1D2 sequencing on the nanopore sequencing platform. Nearly all residual errors in NanoAmpli-Seq sequences originate from deletions in homopolymer regions, indicating that homopolymer aware base calling or error correction may allow for sequencing accuracy comparable to short-read sequencing platforms

Enlighten

Orthology guided transcriptome assembly of Italian ryegrass and meadow fescue for single-nucleotide polymorphism discovery

Author: Bruno Studer
David Kopecký
Elodie Rey
Ghesquiere M.
Humphreys M.
Isabel Roldán‐Ruiz
Jan Bartoš
Jaroslav Doležel
Michael Abrouk
Rognli O.A.
Steven Yates
Tom Ruttink
Tomasz Książczyk
Zbigniew Zwierzykowski
Zerbino D.R.
Štěpán Stočes
Publication venue: 'Crop Science Society of America'
Publication date: 01/01/2016
Field of study

Single-nucleotide polymorphisms (SNPs) represent natural DNA sequence variation. They can be used for various applications including the construction of high-density genetic maps, analysis of genetic variability, genome-wide association studies, and mapbased cloning. Here we report on transcriptome sequencing in the two forage grasses, meadow fescue (Festuca pratensis Huds.) and Italian ryegrass (Lolium multiflorum Lam.), and identification of various classes of SNPs. Using the Orthology Guided Assembly (OGA) strategy, we assembled and annotated a total of 18,952 and 19,036 transcripts for Italian ryegrass and meadow fescue, respectively. In addition, we used transcriptome sequence data of perennial ryegrass (L. perenne L.) from a previous study to identify 16,613 transcripts shared across all three species. Large numbers of intraspecific SNPs were identified in all three species: 248,000 in meadow fescue, 715,000 in Italian ryegrass, and 529,000 in perennial ryegrass. Moreover, we identified almost 25,000 interspecific SNPs located in 5343 genes that can distinguish meadow fescue from Italian ryegrass and 15,000 SNPs located in 3976 genes that discriminate meadow fescue from both Lolium species. All identified SNPs were positioned in silico on the seven linkage groups (LGs) of L. perenne using the GenomeZipper approach. With the identification and positioning of interspecific SNPs, our study provides a valuable resource for the grass research and breeding community and will enable detailed characterization of genomic composition and gene expression analysis in prospective Festuca Lolium hybrids

Repository for Publications and Research Data

Crossref

Ghent University Academic Bibliography

Directory of Open Access Journals

Species Identification and Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences

Author: Hibberd Martin Lloyd
Ho Eliza Xin Pei
Kukkillaya Vinutha Uppoor
Lay Christophe
Low Louie
Nagarajan Niranjan
Ong Swee Hoe
Wilm Andreas
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to capture more than 90% of sequences in the Greengenes database and with nearly twice the resolution of existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the diversity of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.Comment: 17 pages, 2 tables, 2 figures, supplementary materia

arXiv.org e-Print Archive

Crossref

LSHTM Research Online

Directory of Open Access Journals

PubMed Central

FigShare

Swarm: robust and fast clustering method for amplicon-based studies

Author
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

Séance: reference-based phylogenetic analysis for 18S rRNA studies

Author
Publication venue: BioMed Central
Publication date: 30/11/2014
Field of study

Springer - Publisher Connector

SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing

Author: Bailey Jeffrey A.
Hathaway Nicholas J.
Juliano Jonathan J.
Parobek Christian M.
Publication venue: eScholarship@UMassChan
Publication date: 28/02/2018
Field of study

PCR amplicon deep sequencing continues to transform the investigation of genetic diversity in viral, bacterial, and eukaryotic populations. In eukaryotic populations such as Plasmodium falciparum infections, it is important to discriminate sequences differing by a single nucleotide polymorphism. In bacterial populations, single-base resolution can provide improved resolution towards species and strains. Here, we introduce the SeekDeep suite built around the qluster algorithm, which is capable of accurately building de novo clusters representing true, biological local haplotypes differing by just a single base. It outperforms current software, particularly at low frequencies and at low input read depths, whether resolving single-base differences or traditional OTUs. SeekDeep is open source and works with all major sequencing technologies, making it broadly useful in a wide variety of applications of amplicon deep sequencing to extract accurate and maximal biologic information

eScholarship@UMMS

mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref