195 research outputs found

    Weighted poolingā€”practical and cost-effective techniques for pooled high-throughput sequencing

    Get PDF
    Motivation: Despite the rapid decline in sequencing costs, sequencing large cohorts of individuals is still prohibitively expensive. Recently, several sophisticated pooling designs were suggested that can identify carriers of rare alleles in large cohorts with a significantly smaller number of pools, thus dramatically reducing the cost of such large-scale sequencing projects. These approaches use combinatorial pooling designs where each individual is either present or absent from a pool. One can then infer the number of carriers in a pool, and by combining information across pools, reconstruct the identity of the carriers

    Identification of rare alleles and their carriers using compressed se(que)nsing

    Get PDF
    Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on a Compressed Sensing (CS) approach, which is general, simple and efficient. CS allows the use of generic algorithmic tools for simultaneous identification of multiple variants and their carriers. We model the experimental procedure and show via computer simulations that it enables the recovery of rare alleles and their carriers in larger groups than were possible before. Our approach can also be combined with barcoding techniques to provide a feasible solution based on current resequencing costs. For example, when targeting a small enough genomic region (āˆ¼100ā€‰bp) and using only āˆ¼10 sequencing lanes and āˆ¼10 distinct barcodes per lane, one recovers the identity of 4 rare allele carriers out of a population of over 4000 individuals. We demonstrate the performance of our approach over several publicly available experimental data sets

    Compressed Genotyping

    Full text link
    Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from Cystic Fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA sequencing. Since the genetic polymorphisms that underlie these disorders are relatively rare in the human population, the presence or absence of a disease-linked polymorphism can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective genotyping protocol. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies, and assembled a mathematical framework that has some important distinctions from 'traditional' compressed sensing ideas in order to address different biological and technical constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue on Molecular Biology and Neuroscienc

    Bacterial Community Reconstruction Using A Single Sequencing Reaction

    Full text link
    Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. While current methods enable in-depth study of a small number of communities, a simple tool for breadth studies of bacterial population composition in a large number of samples is lacking. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. This method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of the known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture. Using simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA gene sequence may provide enough information for reconstruction of mixtures containing tens of species, out of tens of thousands, even in the presence of realistic measurement noise. Finally, we show initial promising results when applying our method for the reconstruction of a toy experimental mixture with five species. Our approach may have a potential for a practical and efficient way for identifying bacterial species compositions in biological samples.Comment: 28 pages, 12 figure

    Scrible: Ultra-Accurate Error-Correction of Pooled Sequenced Reads

    Full text link
    Abstract. We recently proposed a novel clone-by-clone protocol for de novo genome sequencing that leverages combinatorial pooling design to overcome the limitations of DNA barcoding when multiplexing a large number of samples on second-generation sequencing instruments. Here we address the problem of correcting the short reads obtained from our sequencing protocol. We introduce a novel algorithm called Scrible that exploits properties of the pooling design to accurately identify/correct sequencing errors and minimize the chance of ā€œover-correctingā€. Exper-imental results on synthetic data on the rice genome demonstrate that our method has much higher accuracy in correcting short reads com-pared to state-of-the-art error-correcting methods. On real data on the barley genome we show that Scrible significantly improves the decoding accuracy of short reads to individual BACs.

    High-Throughput Chronological Lifespan Screening of the Fission Yeast Deletion Library Using Barcode Sequencing

    Get PDF
    Ageing is associated with the development of several chronic illnesses, including cardiovascular diseases, diabetes and cancer. To understand the genetic components driving cellular ageing in higher organisms, like ourselves, we study simple eukaryotic model systems which are more accessible and easier to manipulate than higher eukaryotes. This is possible due to the remarkably conserved ageing mechanisms that occurs between species. Here, we employ fission yeast one of the simplest eukaryotic model organisms to study cellular ageing. In this work, we de- coded the fission yeast deletion collection using our in-house developed pipeline, developed an improved version of Bar-seq along with a custom-developed analysis pipeline, determined a method for high-quality RNA extraction and RNA-seq from long-term quiescent yeast cells, and finally, performed a high-throughput Bar-seq screen to profile the chronological lifespan of our decoded strains. We describe bar- code decoding of 94% of the gene deletions; validation of our Bar-seq developed method; identification of ncRNAs as elements important for the cellular quiescence maintenance; Bar-seq screening of the competitively grown decoded strains which identified several long-lived and short-lived mutants following glucose-starvation and cellular culture re-growth; and also, validation of the top hits using isogenic cell cultures revealing eight novel gene deletions important for the early life maintenance, as well as ten novel gene deletion mutants with pro-ageing effects. Overall, in addition to providing rich datasets, we describe several high-throughput methods that can be used for future genome-wide studies, whereby the complementarity of genomics and transcriptomics can be coupled together to further advance our understanding of the genetic factors underpinning cellular ageing in humans

    Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations

    Get PDF
    Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations. In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations
    • ā€¦
    corecore