30 research outputs found

    Estimating the number and size of the main effects in genome-wide case-control association studies

    Get PDF
    It has recently become possible to screen thousands of markers to detect genetic causes of common diseases. Along with this potential comes analytical challenges, and it is important to develop new statistical tools to identify markers with causal effects and accurately estimate their effect sizes. Knowledge of the proportion of markers without true effects (p0) and the effect sizes of markers with effects provides information to control for false discoveries and to design follow-up studies. We apply newly developed methods to simulated Genetic Analysis Workshop 15 genome-wide case-control data sets, including a maximum likelihood (ML) and a quasi-ML (QML) approach that incorporate the test statistic distribution and estimates effect size simultaneously with p0, and two conservative estimators of p0 that do not rely on the test statistic distribution under the alternative. Compared with four existing commonly used estimators for p0, our results illustrated that all of our estimators have favorable properties in terms of the standard deviation with which p0 is estimated. On average, the ML method performed slightly better than the QML method; the conservative method performed well and was even slightly more precise than the ML estimators, and can be more robust in less optimal conditions (small sample sizes and small number of markers). Further improvements and extensions of the proposed methods are conceivable, such as estimating the distribution of effect sizes and taking population stratification into account when obtain estimates of p0 and effect size

    Targeted Bisulfite Pyrosequencing & Amplicon Bisulfite Sequencing Epigenetic Analysis

    Get PDF
    Targeted Bisulfite Pyrosequencing & Amplicon Bisulfite Sequencing Epigenetic Analysis Charles Tran, Dept. of Biology, with Dr. Karolina Aberg, VCU School of Pharmacy Background: The Great Smoky Mountain Study is a longitudinal study that started in 1992 and includes 1,420 participants that were 9 to 13 years at intake and have since been revisited ~ every 2 years. Participants (and their parents) provided detailed assessment of stressors and health outcomes as well as blood samples at each interview. In a recent methylome-wide association study the samples were used to identify methylation marks associated with childhood trauma. In the current work, we present an investigation to replicate these methylation marks in an independent sample. Objective: Our objective is to optimize and apply epigenomic-specific protocols in order to replicate trauma associated methylation biomarkers in an independent study sample. Materials and Methods: We will use DNA samples extracted from saliva from The Young Adolescent Project, another longitudinal study which has obtained relevant information related to childhood trauma. In this sample we will perform replication of top findings using targeted amplicon bisulfite sequencing in saliva samples where amplicons are amplified with JUNO sequencing platform or Pyromark PCR pyrosequencing. Forward and reverse primers are first designed using Pyromark Assay Design software. Primer set candidates are chosen based off of a score of 100; scores are determined by potential for mispriming, likelihood for primer dimers, etc. Higher scores correlate to better PCR performance. Then, BiSearch, an online primer-design algorithm and search tool is used to check primer sets in order to ameliorate PCR efficiency by avoiding non-specific PCR products due to genomic repetition. PCR product is then examined with 2% agarose gel electrophoresis and Agilent Bioanalyzer chip-based capillary electrophoresis in order to determine if amplicons of the correct size were obtained. (Once primers of sufficient efficacy are designed, they are subject to 5’ biotin tag modifications—this makes purification of proteins and other target molecules easier while utilizing streptavidin-coupled Dynabeads). Methylation sites incompatible with JUNO due to amplicon sites exceeding 200 base pairs would instead be analyzed using Pyromark Pyrosequencing Assay for which it is easier to design assays but is more costly and lower throughput: the output of resulting data being similar in quality. Results: We attempted to design assays for 60 loci. Of these we have designed and validated the quality of 23 assays for JUNO and 3 for the Pyromark Q96 sequencing and quantification platform. PCR analysis followed afterwards. We were not able to design assays for 34 sites due to: amplicon sites having exceeded 200 base pairs, forming of hair pins, forming of primer dimers, amplicon sites being too far from target region, or formation of multiple PCR products, as determined by IDT analysis. The 3 primer sets were incompatible with JUNO due to formation of primer dimer and hairpin formations when 5’ tags were added therefore Pyromark Q96 assay was optimal. Conclusion: In conclusion we have optimized and evaluated 23 assays for the JUNO sequencing platform and 3 primers for Pyromark Q96 that, in the next step, will be used to assess the replication of loci of interest in trauma associated methylation biomarkers from saliva samples.https://scholarscompass.vcu.edu/uresposters/1391/thumbnail.jp

    Post-Mortem Brain Nuclei Isolation for Single Nucleus RNA Sequencing

    Get PDF
    Abstract Post-Mortem Brain Nuclei Isolation for Single Nucleus RNA Sequencing Charles Tran, Dept. of Biology, with Dr. Karolina Aberg, VCU School of Pharmacy When tissue samples are studied in bulk without consideration for different cell proportions and types, results can be biased due to the attenuation of unique cellular expressions. In order to study cell type specific RNA expression profiles within tissue, single cell RNA sequencing (scRNA-seq) is used. For scRNA-seq studies it is critical to have intact cells. However, when investigating frozen post-mortem brain tissue, it is often challenging to isolate intact whole cells. An alternative solution is to instead isolate nuclei (which have similar, but not identical, transcriptomes to cells) and then perform single-nucleus RNA sequencing (snRNA-seq). In this study we have carefully optimized a protocol for nuclei extraction from post-mortem brain cells suitable for downstream snRNA-seq analysis. We found that adjusting our protocol to include less aggressive methods of tissue homogenization and sample-retaining lab techniques has resulted in the successful removal of cell debris and myelin alongside providing a workable sample size. In conclusion we have successfully evaluated and prepared enough high-quality nuclei for downstream scRNA-seq using our optimized protocol.https://scholarscompass.vcu.edu/uresposters/1398/thumbnail.jp

    Estimation of CpG Coverage in Whole Methylome Next-Generation Sequencing Studies

    Get PDF
    Background Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the next-generation sequencing (NGS) of single-end libraries created from samples that are enriched for methylated DNA fragments. A limitation of single-end libraries is that the fragment size distribution is not observed. This hampers several aspects of the data analysis such as the calculation of enrichment measures that are based on the number of fragments covering the CpGs. Results We developed a non-parametric method that uses isolated CpGs to estimate sample-specific fragment size distributions from the empirical sequencing data. Through simulations we show that our method is highly accurate. While the traditional (extended) read count methods resulted in severely biased coverage estimates and introduces artificial inter-individual differences, through the use of the estimated fragment size distributions we could remove these biases almost entirely. Furthermore, we found correlations of 0.999 between coverage estimates obtained using fragment size distributions that were estimated with our method versus those that were “observed” in paired-end sequencing data. Conclusions We propose a non-parametric method for estimating fragment size distributions that is highly precise and can improve the analysis of cost-effective MWAS studies that sequence single-end libraries created from samples that are enriched for methylated DNA fragments

    MethylPCA: a toolkit to control for confounders in methylome-wide association studies

    Get PDF
    Abstract Background In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome. Result We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders. Conclusions MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS

    A statistical method for excluding non-variable CpG sites in high-throughput DNA methylation profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput DNA methylation arrays are likely to accelerate the pace of methylation biomarker discovery for a wide variety of diseases. A potential problem with a standard set of probes measuring the methylation status of CpG sites across the whole genome is that many sites may not show inter-individual methylation variation among the biosamples for the disease outcome being studied. Inclusion of these so-called "non-variable sites" will increase the risk of false discoveries and reduce statistical power to detect biologically relevant methylation markers.</p> <p>Results</p> <p>We propose a method to estimate the proportion of non-variable CpG sites and eliminate those sites from further analyses. Our method is illustrated using data obtained by hybridizing DNA extracted from the peripheral blood mononuclear cells of 311 samples to an array assaying 1505 CpG sites. Results showed that a large proportion of the CpG sites did not show inter-individual variation in methylation.</p> <p>Conclusions</p> <p>Our method resulted in a substantial improvement in association signals between methylation sites and outcome variables while controlling the false discovery rate at the same level.</p
    corecore