260,881 research outputs found

    Optimal Assembly for High Throughput Shotgun Sequencing

    Get PDF
    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

    Reliable and accurate diagnostics from highly multiplexed sequencing assays

    Get PDF
    Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA

    Throughput Rate Optimization in High Multiplicity Sequencing Problems

    Get PDF
    Mixed model assembly systems assemble products (parts) of differenttypes in certain prespecified quantities. A minimal part set is a smallestpossible set of product type quantities, to be called the multiplicities,in which the numbers of assembled products of the various types are inthe desired ratios. It is common practice to repeatedly assemble minimalpart sets, and in such a way that the products of each of the minimalpart sets are assembled in the same sequence. Little is known howeverregarding the resulting throughput rate, in particular in comparison to thethroughput rates attainable by other input strategies. This paper investigatesthroughput and balancing issues in repetitive manufacturing environments.It considers sequencing problems that occur in this setting andhow the repetition strategy influences throughput. We model the problemsas a generalization of the traveling salesman problem and derive ourresults in this general setting. Our analysis uses well known concepts fromscheduling theory and combinatorial optimization.Economics ;

    A High-Throughput Method for Illumina RNA-Seq Library Preparation.

    Get PDF
    With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation

    YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    Get PDF
    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes

    High-throughput DNA sequencing to survey bacterial histidine and tyrosine decarboxylases in raw milk cheeses

    Get PDF
    peer-reviewedBackground The aim of this study was to employ high-throughput DNA sequencing to assess the incidence of bacteria with biogenic amine (BA; histamine and tyramine) producing potential from among 10 different cheeses varieties. To facilitate this, a diagnostic approach using degenerate PCR primer pairs that were previously designed to amplify segments of the histidine (hdc) and tyrosine (tdc) decarboxylase gene clusters were employed. In contrast to previous studies in which the decarboxylase genes of specific isolates were studied, in this instance amplifications were performed using total metagenomic DNA extracts. Results Amplicons were initially cloned to facilitate Sanger sequencing of individual gene fragments to ensure that a variety of hdc and tdc genes were present. Once this was established, high throughput DNA sequencing of these amplicons was performed to provide a more in-depth analysis of the histamine- and tyramine-producing bacteria present in the cheeses. High-throughput sequencing resulted in generation of a total of 1,563,764 sequencing reads and revealed that Lactobacillus curvatus, Enterococcus faecium and E. faecalis were the dominant species with tyramine producing potential, while Lb. buchneri was found to be the dominant species harbouring histaminogenic potential. Commonly used cheese starter bacteria, including Streptococcus thermophilus and Lb. delbreueckii, were also identified as having biogenic amine producing potential in the cheese studied. Molecular analysis of bacterial communities was then further complemented with HPLC quantification of histamine and tyramine in the sampled cheeses. Conclusions In this study, high-throughput DNA sequencing successfully identified populations capable of amine production in a variety of cheeses. This approach also gave an insight into the broader hdc and tdc complement within the various cheeses. This approach can be used to detect amine producing communities not only in food matrices but also in the production environment itself.This work was funded by the Department of Agriculture, Food and the Marine under the Food Institutional Research Measure through the ‘Cheeseboard 2015’ project. Daniel J. O’Sullivan is in receipt of a Teagasc Walsh Fellowship, Grant Number: 2012205

    Base calling for high-throughput short-read sequencing: dynamic programming solutions

    Get PDF
    Shreepriya Das and Haris Vikalo are with the Electrical and Computer Engineering Department, The University of Texas at Austin, Austin, Texas 78712, USABackground: Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge. Results: We develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog webcite. Conclusions: We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications.Electrical and Computer [email protected]
    corecore