195 research outputs found
Weighted poolingāpractical and cost-effective techniques for pooled high-throughput sequencing
Motivation: Despite the rapid decline in sequencing costs, sequencing large cohorts of individuals is still prohibitively expensive. Recently, several sophisticated pooling designs were suggested that can identify carriers of rare alleles in large cohorts with a significantly smaller number of pools, thus dramatically reducing the cost of such large-scale sequencing projects. These approaches use combinatorial pooling designs where each individual is either present or absent from a pool. One can then infer the number of carriers in a pool, and by combining information across pools, reconstruct the identity of the carriers
Identification of rare alleles and their carriers using compressed se(que)nsing
Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on a Compressed Sensing (CS) approach, which is general, simple and efficient. CS allows the use of generic algorithmic tools for simultaneous identification of multiple variants and their carriers. We model the experimental procedure and show via computer simulations that it enables the recovery of rare alleles and their carriers in larger groups than were possible before. Our approach can also be combined with barcoding techniques to provide a feasible solution based on current resequencing costs. For example, when targeting a small enough genomic region (ā¼100ābp) and using only ā¼10 sequencing lanes and ā¼10 distinct barcodes per lane, one recovers the identity of 4 rare allele carriers out of a population of over 4000 individuals. We demonstrate the performance of our approach over several publicly available experimental data sets
Compressed Genotyping
Significant volumes of knowledge have been accumulated in recent years
linking subtle genetic variations to a wide variety of medical disorders from
Cystic Fibrosis to mental retardation. Nevertheless, there are still great
challenges in applying this knowledge routinely in the clinic, largely due to
the relatively tedious and expensive process of DNA sequencing. Since the
genetic polymorphisms that underlie these disorders are relatively rare in the
human population, the presence or absence of a disease-linked polymorphism can
be thought of as a sparse signal. Using methods and ideas from compressed
sensing and group testing, we have developed a cost-effective genotyping
protocol. In particular, we have adapted our scheme to a recently developed
class of high throughput DNA sequencing technologies, and assembled a
mathematical framework that has some important distinctions from 'traditional'
compressed sensing ideas in order to address different biological and technical
constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue
on Molecular Biology and Neuroscienc
Bacterial Community Reconstruction Using A Single Sequencing Reaction
Bacteria are the unseen majority on our planet, with millions of species and
comprising most of the living protoplasm. While current methods enable in-depth
study of a small number of communities, a simple tool for breadth studies of
bacterial population composition in a large number of samples is lacking. We
propose a novel approach for reconstruction of the composition of an unknown
mixture of bacteria using a single Sanger-sequencing reaction of the mixture.
This method is based on compressive sensing theory, which deals with
reconstruction of a sparse signal using a small number of measurements.
Utilizing the fact that in many cases each bacterial community is comprised of
a small subset of the known bacterial species, we show the feasibility of this
approach for determining the composition of a bacterial mixture. Using
simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA
gene sequence may provide enough information for reconstruction of mixtures
containing tens of species, out of tens of thousands, even in the presence of
realistic measurement noise. Finally, we show initial promising results when
applying our method for the reconstruction of a toy experimental mixture with
five species. Our approach may have a potential for a practical and efficient
way for identifying bacterial species compositions in biological samples.Comment: 28 pages, 12 figure
Scrible: Ultra-Accurate Error-Correction of Pooled Sequenced Reads
Abstract. We recently proposed a novel clone-by-clone protocol for de novo genome sequencing that leverages combinatorial pooling design to overcome the limitations of DNA barcoding when multiplexing a large number of samples on second-generation sequencing instruments. Here we address the problem of correcting the short reads obtained from our sequencing protocol. We introduce a novel algorithm called Scrible that exploits properties of the pooling design to accurately identify/correct sequencing errors and minimize the chance of āover-correctingā. Exper-imental results on synthetic data on the rice genome demonstrate that our method has much higher accuracy in correcting short reads com-pared to state-of-the-art error-correcting methods. On real data on the barley genome we show that Scrible significantly improves the decoding accuracy of short reads to individual BACs.
High-Throughput Chronological Lifespan Screening of the Fission Yeast Deletion Library Using Barcode Sequencing
Ageing is associated with the development of several chronic illnesses, including cardiovascular diseases, diabetes and cancer. To understand the genetic components driving cellular ageing in higher organisms, like ourselves, we study simple eukaryotic model systems which are more accessible and easier to manipulate than higher eukaryotes. This is possible due to the remarkably conserved ageing mechanisms that occurs between species. Here, we employ fission yeast one of the simplest eukaryotic model organisms to study cellular ageing. In this work, we de- coded the fission yeast deletion collection using our in-house developed pipeline, developed an improved version of Bar-seq along with a custom-developed analysis pipeline, determined a method for high-quality RNA extraction and RNA-seq from long-term quiescent yeast cells, and finally, performed a high-throughput Bar-seq screen to profile the chronological lifespan of our decoded strains. We describe bar- code decoding of 94% of the gene deletions; validation of our Bar-seq developed method; identification of ncRNAs as elements important for the cellular quiescence maintenance; Bar-seq screening of the competitively grown decoded strains which identified several long-lived and short-lived mutants following glucose-starvation and cellular culture re-growth; and also, validation of the top hits using isogenic cell cultures revealing eight novel gene deletions important for the early life maintenance, as well as ten novel gene deletion mutants with pro-ageing effects. Overall, in addition to providing rich datasets, we describe several high-throughput methods that can be used for future genome-wide studies, whereby the complementarity of genomics and transcriptomics can be coupled together to further advance our understanding of the genetic factors underpinning cellular ageing in humans
Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations
Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations.
In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations
- ā¦