24 research outputs found

    Deep Sequencing of the Nicastrin Gene in Pooled DNA, the Identification of Genetic Variants That Affect Risk of Alzheimer's Disease

    Get PDF
    Nicastrin is an obligatory component of the γ-secretase; the enzyme complex that leads to the production of Aβ fragments critically central to the pathogenesis of Alzheimer's disease (AD). Analyses of the effects of common variation in this gene on risk for late onset AD have been inconclusive. We investigated the effect of rare variation in the coding regions of the Nicastrin gene in a cohort of AD patients and matched controls using an innovative pooling approach and next generation sequencing. Five SNPs were identified and validated by individual genotyping from 311 cases and 360 controls. Association analysis identified a non-synonymous rare SNP (N417Y) with a statistically higher frequency in cases compared to controls in the Greek population (OR 3.994, CI 1.105–14.439, p = 0.035). This finding warrants further investigation in a larger cohort and adds weight to the hypothesis that rare variation explains some of genetic heritability still to be identified in Alzheimer's disease

    Use of a targeted, combinatorial next-generation sequencing approach for the study of bicuspid aortic valve

    Get PDF
    BACKGROUND: Bicuspid aortic valve (BAV) is the most common type of congenital heart disease with a population prevalence of 1-2%. While BAV is known to be highly heritable, mutations in single genes (such as GATA5 and NOTCH1) have been reported in few human BAV cases. Traditional gene sequencing methods are time and labor intensive, while next-generation high throughput sequencing remains costly for large patient cohorts and requires extensive bioinformatics processing. Here we describe an approach to targeted multi-gene sequencing with combinatorial pooling of samples from BAV patients. METHODS: We studied a previously described cohort of 78 unrelated subjects with echocardiogram-identified BAV. Subjects were identified as having isolated BAV or BAV associated with coarctation of aorta (BAV-CoA). BAV cusp fusion morphology was defined as right-left cusp fusion, right non-coronary cusp fusion, or left non-coronary cusp fusion. Samples were combined into 19 pools using a uniquely overlapping combinatorial design; a given mutation could be attributed to a single individual on the basis of which pools contained the mutation. A custom gene capture of 97 candidate genes was sequenced on the Illumina HiSeq 2000. Multistep bioinformatics processing was performed for base calling, variant identification, and in-silico analysis of putative disease-causing variants. RESULTS: Targeted capture identified 42 rare, non-synonymous, exonic variants involving 35 of the 97 candidate genes. Among these variants, in-silico analysis classified 33 of these variants as putative disease-causing changes. Sanger sequencing confirmed thirty-one of these variants, found among 16 individuals. There were no significant differences in variant burden among BAV fusion phenotypes or isolated BAV versus BAV-CoA. Pathway analysis suggests a role for the WNT signaling pathway in human BAV. CONCLUSION: We successfully developed a pooling and targeted capture strategy that enabled rapid and cost effective next generation sequencing of target genes in a large patient cohort. This approach identified a large number of putative disease-causing variants in a cohort of patients with BAV, including variants in 26 genes not previously associated with human BAV. The data suggest that BAV heritability is complex and polygenic. Our pooling approach saved over $39,350 compared to an unpooled, targeted capture sequencing strategy

    Genotyping common and rare variation using overlapping pool sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants.</p> <p>Results</p> <p>In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications.</p> <p>Conclusions</p> <p>Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.</p

    poolMC: Smart pooling of mRNA samples in microarray experiments

    Get PDF
    Background: Typically, pooling of mRNA samples in microarray experiments implies mixing mRNA from several biological-replicate samples before hybridization onto a microarray chip. Here we describe an alternative smart pooling strategy in which different samples, not necessarily biological replicates, are pooled in an information theoretic efficient way. Further, each sample is tested on multiple chips, but always in pools made up of different samples. The end goal is to exploit the compressibility of microarray data to reduce the number of chips used and increase the robustness to noise in measurements. Results: A theoretical framework to perform smart pooling of mRNA samples in microarray experiments was established and the software implementation of the pooling and decoding algorithms was developed in MATLAB. A proof-of-concept smart pooled experiment was performed using validated biological samples on commercially available gene chips. Conclusions: The theoretical developments and experimental demonstration in this paper provide a useful starting point to investigate smart pooling of mRNA samples in microarray experiments. Important conditions for its successful implementation include linearity of measurements, sparsity in data, and large experiment size.

    A statistical method for the detection of variants from next-generation resequencing of DNA pools

    Get PDF
    Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing

    SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data

    Get PDF
    We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial–binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to ‘accept or reject the candidates’ provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/

    Estimating population size via line graph reconstruction

    Get PDF
    Background: We propose a novel graph theoretic method to estimate haplotype population size from genotype data. The method considers only the potential sharing of haplotypes between individuals and is based on transforming the graph of potential haplotype sharing into a line graph using a minimum number of edge and vertex deletions. Results: We show that the resulting line graph deletion problems are NP complete and provide exact integer programming solutions for them. We test our approach using extensive simulations of multiple population evolution and genotypes sampling scenarios. Our results also indicate that the method may be useful in comparing populations and it may be used as a first step in a method for haplotype phasing. Conclusions: Our computational experiments show that when most of the sharings are true sharings the problem can be solved very fast and the estimated size is very close to the true size; when many of the potential sharings do not stem from true haplotype sharing, our method gives reasonable lower bounds on the underlying number of haplotypes. In comparison, a naive approach of phasing the input genotypes provides trivial upper bounds of twice the number of genotypes

    Statistical Mutation Calling from Sequenced Overlapping DNA Pools in TILLING Experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>TILLING (Targeting induced local lesions IN genomes) is an efficient reverse genetics approach for detecting induced mutations in pools of individuals. Combined with the high-throughput of next-generation sequencing technologies, and the resolving power of overlapping pool design, TILLING provides an efficient and economical platform for functional genomics across thousands of organisms.</p> <p>Results</p> <p>We propose a probabilistic method for calling TILLING-induced mutations, and their carriers, from high throughput sequencing data of overlapping population pools, where each individual occurs in two pools. We assign a probability score to each sequence position by applying Bayes' Theorem to a simplified binomial model of sequencing error and expected mutations, taking into account the coverage level. We test the performance of our method on variable quality, high-throughput sequences from wheat and rice mutagenized populations.</p> <p>Conclusions</p> <p>We show that our method effectively discovers mutations in large populations with sensitivity of 92.5% and specificity of 99.8%. It also outperforms existing SNP detection methods in detecting real mutations, especially at higher levels of coverage variability across sequenced pools, and in lower quality short reads sequence data. The implementation of our method is available from: <url>http://www.cs.ucdavis.edu/filkov/CAMBa/</url>.</p