198,918 research outputs found

    Optimal Assembly for High Throughput Shotgun Sequencing

    Get PDF
    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

    DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

    Full text link
    We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

    High-Throughput, Whole-Genome Sequencing

    Get PDF
    Since the completion of the Human Genome Project, research focusing on the consequence of known human genetic code has advanced by leaps and bounds. The development of personalized medicine, a field focused on enumerating the effects of individual genetic variations, termed SNPs, has become a reality for those researching the molecular basis of disease. With clinical correlates between genotype and prognosis becoming ever more common, the utility of personal genetic screening has become impossible to ignore. In this report, we present PennBio: a whole-genome sequencing company utilizing a novel single-molecule, real time sequencing-by-synthesis technology. Using unique zero-mode waveguides, which have revolutionized single-molecule detection, individual enzymes polymerizing novel phospholinked fluorescence labeled nucleotides can be observed as they sequence genomic template DNA. Modern optical techniques record these fragmented sequences, which are then analyzed by highly efficient alignment algorithms. A personal genomic code will ultimately allow consumers to be aware of their genetic predispositions as the medical community continues to discover them

    Throughput Rate Optimization in High Multiplicity Sequencing Problems

    Get PDF
    Mixed model assembly systems assemble products (parts) of differenttypes in certain prespecified quantities. A minimal part set is a smallestpossible set of product type quantities, to be called the multiplicities,in which the numbers of assembled products of the various types are inthe desired ratios. It is common practice to repeatedly assemble minimalpart sets, and in such a way that the products of each of the minimalpart sets are assembled in the same sequence. Little is known howeverregarding the resulting throughput rate, in particular in comparison to thethroughput rates attainable by other input strategies. This paper investigatesthroughput and balancing issues in repetitive manufacturing environments.It considers sequencing problems that occur in this setting andhow the repetition strategy influences throughput. We model the problemsas a generalization of the traveling salesman problem and derive ourresults in this general setting. Our analysis uses well known concepts fromscheduling theory and combinatorial optimization.Economics ;

    Reliable and accurate diagnostics from highly multiplexed sequencing assays

    Get PDF
    Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA

    A High-Throughput Method for Illumina RNA-Seq Library Preparation.

    Get PDF
    With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation

    High-throughput DNA sequencing to survey bacterial histidine and tyrosine decarboxylases in raw milk cheeses

    Get PDF
    peer-reviewedBackground The aim of this study was to employ high-throughput DNA sequencing to assess the incidence of bacteria with biogenic amine (BA; histamine and tyramine) producing potential from among 10 different cheeses varieties. To facilitate this, a diagnostic approach using degenerate PCR primer pairs that were previously designed to amplify segments of the histidine (hdc) and tyrosine (tdc) decarboxylase gene clusters were employed. In contrast to previous studies in which the decarboxylase genes of specific isolates were studied, in this instance amplifications were performed using total metagenomic DNA extracts. Results Amplicons were initially cloned to facilitate Sanger sequencing of individual gene fragments to ensure that a variety of hdc and tdc genes were present. Once this was established, high throughput DNA sequencing of these amplicons was performed to provide a more in-depth analysis of the histamine- and tyramine-producing bacteria present in the cheeses. High-throughput sequencing resulted in generation of a total of 1,563,764 sequencing reads and revealed that Lactobacillus curvatus, Enterococcus faecium and E. faecalis were the dominant species with tyramine producing potential, while Lb. buchneri was found to be the dominant species harbouring histaminogenic potential. Commonly used cheese starter bacteria, including Streptococcus thermophilus and Lb. delbreueckii, were also identified as having biogenic amine producing potential in the cheese studied. Molecular analysis of bacterial communities was then further complemented with HPLC quantification of histamine and tyramine in the sampled cheeses. Conclusions In this study, high-throughput DNA sequencing successfully identified populations capable of amine production in a variety of cheeses. This approach also gave an insight into the broader hdc and tdc complement within the various cheeses. This approach can be used to detect amine producing communities not only in food matrices but also in the production environment itself.This work was funded by the Department of Agriculture, Food and the Marine under the Food Institutional Research Measure through the ‘Cheeseboard 2015’ project. Daniel J. O’Sullivan is in receipt of a Teagasc Walsh Fellowship, Grant Number: 2012205
    corecore