Search CORE

198,918 research outputs found

Optimal Assembly for High Throughput Shotgun Sequencing

Author: Bresler Guy
Bresler Ma'ayan
Tse David
Publication venue
Publication date: 18/02/2013
Field of study

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Author: Lee Byunghan
Moon Taesup
Weissman Tsachy
Yoon Sungroh
Publication venue
Publication date: 01/01/2017
Field of study

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Recommended from our members

Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads.

Author: Arthur Timothy D
Bankevich Anton
Boland Brigid S
Brennan Caitriona
Chang John T
Chen Feng
Conrad Douglas J
Dang Jason W
Dorrestein Pieter C
Fedarko Marcus
Gaffney James
Green Cliff
Humphrey Greg C
Jepsen Kristen
Khosroheidari Mahdieh
Knight Rob
Liyanage Marlon
Martino Cameron
Minich Jeremiah
Nurk Sergey
Pevzner Pavel A
Phelan Vanessa V
Quinn Robert A
Rana Tariq M
Salido Rodolfo A
Sandborn William J
Sanders Jon G
Sanders Karenina
Smarr Larry
Xu Zhenjiang Z
Zhu Qiyun
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing

eScholarship - University of California

High-Throughput, Whole-Genome Sequencing

Author: Bittle Gregory J
Petkov Boris N
Rhee Yonghee Evan
Woods Elliot C
Publication venue: ScholarlyCommons
Publication date: 14/04/2009
Field of study

Since the completion of the Human Genome Project, research focusing on the consequence of known human genetic code has advanced by leaps and bounds. The development of personalized medicine, a field focused on enumerating the effects of individual genetic variations, termed SNPs, has become a reality for those researching the molecular basis of disease. With clinical correlates between genotype and prognosis becoming ever more common, the utility of personal genetic screening has become impossible to ignore. In this report, we present PennBio: a whole-genome sequencing company utilizing a novel single-molecule, real time sequencing-by-synthesis technology. Using unique zero-mode waveguides, which have revolutionized single-molecule detection, individual enzymes polymerizing novel phospholinked fluorescence labeled nucleotides can be observed as they sequence genomic template DNA. Modern optical techniques record these fragmented sequences, which are then analyzed by highly efficient alignment algorithms. A personal genomic code will ultimately allow consumers to be aware of their genetic predispositions as the medical community continues to discover them

ScholarlyCommons@Penn

Throughput Rate Optimization in High Multiplicity Sequencing Problems

Author: Grigoriev Alexander
Klundert Joris ,van de
Publication venue
Publication date
Field of study

Mixed model assembly systems assemble products (parts) of differenttypes in certain prespecified quantities. A minimal part set is a smallestpossible set of product type quantities, to be called the multiplicities,in which the numbers of assembled products of the various types are inthe desired ratios. It is common practice to repeatedly assemble minimalpart sets, and in such a way that the products of each of the minimalpart sets are assembled in the same sequence. Little is known howeverregarding the resulting throughput rate, in particular in comparison to thethroughput rates attainable by other input strategies. This paper investigatesthroughput and balancing issues in repetitive manufacturing environments.It considers sequencing problems that occur in this setting andhow the repetition strategy influences throughput. We model the problemsas a generalization of the traveling salesman problem and derive ourresults in this general setting. Our analysis uses well known concepts fromscheduling theory and combinatorial optimization.Economics ;

Research Papers in Economics

Reliable and accurate diagnostics from highly multiplexed sequencing assays

Author: Bloom Joshua S.
Booeshaghi A. Sina
Cooper Aaron R.
Gehring Jase
Kosuri Sriram
Lubock Nathan B.
Luebbert Laura
Pachter Lior
Simpkins Scott W.
Publication venue: Nature Publishing Group
Publication date: 10/12/2020
Field of study

Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA

Caltech Authors

A High-Throughput Method for Illumina RNA-Seq Library Preparation.

Author: Daniel H Chitwood
Jie ePeng
Julin N Maloof
Lauren R Headland
Neelima R Sinha
Ravi eKumar
Seisuke eKimura
Seisuke eKimura
Yasunori eIchihashi
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

eScholarship - University of California

High-throughput DNA sequencing to survey bacterial histidine and tyrosine decarboxylases in raw milk cheeses

Author: Cotter Paul D.
Fallico Vincenzo
Giblin Linda
McSweeney Paul L. H.
O'Sullivan Daniel
O'Sullivan Orla
Sheehan Diarmuid (JJ)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/11/2015
Field of study

peer-reviewedBackground The aim of this study was to employ high-throughput DNA sequencing to assess the incidence of bacteria with biogenic amine (BA; histamine and tyramine) producing potential from among 10 different cheeses varieties. To facilitate this, a diagnostic approach using degenerate PCR primer pairs that were previously designed to amplify segments of the histidine (hdc) and tyrosine (tdc) decarboxylase gene clusters were employed. In contrast to previous studies in which the decarboxylase genes of specific isolates were studied, in this instance amplifications were performed using total metagenomic DNA extracts. Results Amplicons were initially cloned to facilitate Sanger sequencing of individual gene fragments to ensure that a variety of hdc and tdc genes were present. Once this was established, high throughput DNA sequencing of these amplicons was performed to provide a more in-depth analysis of the histamine- and tyramine-producing bacteria present in the cheeses. High-throughput sequencing resulted in generation of a total of 1,563,764 sequencing reads and revealed that Lactobacillus curvatus, Enterococcus faecium and E. faecalis were the dominant species with tyramine producing potential, while Lb. buchneri was found to be the dominant species harbouring histaminogenic potential. Commonly used cheese starter bacteria, including Streptococcus thermophilus and Lb. delbreueckii, were also identified as having biogenic amine producing potential in the cheese studied. Molecular analysis of bacterial communities was then further complemented with HPLC quantification of histamine and tyramine in the sampled cheeses. Conclusions In this study, high-throughput DNA sequencing successfully identified populations capable of amine production in a variety of cheeses. This approach also gave an insight into the broader hdc and tdc complement within the various cheeses. This approach can be used to detect amine producing communities not only in food matrices but also in the production environment itself.This work was funded by the Department of Agriculture, Food and the Marine under the Food Institutional Research Measure through the ‘Cheeseboard 2015’ project. Daniel J. O’Sullivan is in receipt of a Teagasc Walsh Fellowship, Grant Number: 2012205