198,918 research outputs found
Optimal Assembly for High Throughput Shotgun Sequencing
We present a framework for the design of optimal assembly algorithms for
shotgun sequencing under the criterion of complete reconstruction. We derive a
lower bound on the read length and the coverage depth required for
reconstruction in terms of the repeat statistics of the genome. Building on
earlier works, we design a de Brujin graph based assembly algorithm which can
achieve very close to the lower bound for repeat statistics of a wide range of
sequenced genomes, including the GAGE datasets. The results are based on a set
of necessary and sufficient conditions on the DNA sequence and the reads for
reconstruction. The conditions can be viewed as the shotgun sequencing analogue
of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by
Hybridization.Comment: 26 pages, 18 figure
DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing
We consider the correction of errors from nucleotide sequences produced by
next-generation targeted amplicon sequencing. The next-generation sequencing
(NGS) platforms can provide a great deal of sequencing data thanks to their
high throughput, but the associated error rates often tend to be high.
Denoising in high-throughput sequencing has thus become a crucial process for
boosting the reliability of downstream analyses. Our methodology, named
DUDE-Seq, is derived from a general setting of reconstructing finite-valued
source data corrupted by a discrete memoryless channel and effectively corrects
substitution and homopolymer indel errors, the two major types of sequencing
errors in most high-throughput targeted amplicon sequencing platforms. Our
experimental studies with real and simulated datasets suggest that the proposed
DUDE-Seq not only outperforms existing alternatives in terms of
error-correction capability and time efficiency, but also boosts the
reliability of downstream analyses. Further, the flexibility of DUDE-Seq
enables its robust application to different sequencing platforms and analysis
pipelines by simple updates of the noise model. DUDE-Seq is available at
http://data.snu.ac.kr/pub/dude-seq
Recommended from our members
Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads.
As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing
High-Throughput, Whole-Genome Sequencing
Since the completion of the Human Genome Project, research focusing on the consequence of known human genetic code has advanced by leaps and bounds. The development of personalized medicine, a field focused on enumerating the effects of individual genetic variations, termed SNPs, has become a reality for those researching the molecular basis of disease. With clinical correlates between genotype and prognosis becoming ever more common, the utility of personal genetic screening has become impossible to ignore. In this report, we present PennBio: a whole-genome sequencing company utilizing a novel single-molecule, real time sequencing-by-synthesis technology. Using unique zero-mode waveguides, which have revolutionized single-molecule detection, individual enzymes polymerizing novel phospholinked fluorescence labeled nucleotides can be observed as they sequence genomic template DNA. Modern optical techniques record these fragmented sequences, which are then analyzed by highly efficient alignment algorithms. A personal genomic code will ultimately allow consumers to be aware of their genetic predispositions as the medical community continues to discover them
Throughput Rate Optimization in High Multiplicity Sequencing Problems
Mixed model assembly systems assemble products (parts) of differenttypes in certain prespecified quantities. A minimal part set is a smallestpossible set of product type quantities, to be called the multiplicities,in which the numbers of assembled products of the various types are inthe desired ratios. It is common practice to repeatedly assemble minimalpart sets, and in such a way that the products of each of the minimalpart sets are assembled in the same sequence. Little is known howeverregarding the resulting throughput rate, in particular in comparison to thethroughput rates attainable by other input strategies. This paper investigatesthroughput and balancing issues in repetitive manufacturing environments.It considers sequencing problems that occur in this setting andhow the repetition strategy influences throughput. We model the problemsas a generalization of the traveling salesman problem and derive ourresults in this general setting. Our analysis uses well known concepts fromscheduling theory and combinatorial optimization.Economics ;
Reliable and accurate diagnostics from highly multiplexed sequencing assays
Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA
A High-Throughput Method for Illumina RNA-Seq Library Preparation.
With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation
High-throughput DNA sequencing to survey bacterial histidine and tyrosine decarboxylases in raw milk cheeses
peer-reviewedBackground
The aim of this study was to employ high-throughput DNA sequencing to assess the incidence of bacteria with biogenic amine (BA; histamine and tyramine) producing potential from among 10 different cheeses varieties. To facilitate this, a diagnostic approach using degenerate PCR primer pairs that were previously designed to amplify segments of the histidine (hdc) and tyrosine (tdc) decarboxylase gene clusters were employed. In contrast to previous studies in which the decarboxylase genes of specific isolates were studied, in this instance amplifications were performed using total metagenomic DNA extracts.
Results
Amplicons were initially cloned to facilitate Sanger sequencing of individual gene fragments to ensure that a variety of hdc and tdc genes were present. Once this was established, high throughput DNA sequencing of these amplicons was performed to provide a more in-depth analysis of the histamine- and tyramine-producing bacteria present in the cheeses. High-throughput sequencing resulted in generation of a total of 1,563,764 sequencing reads and revealed that Lactobacillus curvatus, Enterococcus faecium and E. faecalis were the dominant species with tyramine producing potential, while Lb. buchneri was found to be the dominant species harbouring histaminogenic potential. Commonly used cheese starter bacteria, including Streptococcus thermophilus and Lb. delbreueckii, were also identified as having biogenic amine producing potential in the cheese studied. Molecular analysis of bacterial communities was then further complemented with HPLC quantification of histamine and tyramine in the sampled cheeses.
Conclusions
In this study, high-throughput DNA sequencing successfully identified populations capable of amine production in a variety of cheeses. This approach also gave an insight into the broader hdc and tdc complement within the various cheeses. This approach can be used to detect amine producing communities not only in food matrices but also in the production environment itself.This work was funded by the Department of Agriculture, Food and the Marine under the Food Institutional Research Measure through the
‘Cheeseboard 2015’ project. Daniel J. O’Sullivan is in receipt of a Teagasc Walsh Fellowship, Grant Number: 2012205
- …