7,144 research outputs found
RNA-Seq optimization with eQTL gold standards.
BackgroundRNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking.ResultsTo address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis.ConclusionAs each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments
Deep Sequencing of Three Loci Implicated in Large-Scale Genome-Wide Association Study Smoking Meta-Analyses
Genome-wide association study meta-analyses have robustly implicated three loci that affect susceptibility for smoking: CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6 and EGLN2\CYP2A6. Functional follow-up studies of these loci are needed to provide insight into biological mechanisms. However, these efforts have been hampered by a lack of knowledge about the specific causal variant(s) involved. In this study, we prioritized variants in terms of the likelihood they account for the reported associations. We employed targeted capture of the CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6, and EGLN2\CYP2A6 loci and flanking regions followed by next-generation deep sequencing (mean coverage 78×) to capture genomic variation in 363 individuals. We performed single locus tests to determine if any single variant accounts for the association, and examined if sets of (rare) variants that overlapped with biologically meaningful annotations account for the associations. In total, we investigated 963 variants, of which 71.1% were rare (minor allele frequency < 0.01), 6.02% were insertion/deletions, and 51.7% were catalogued in dbSNP141. The single variant results showed that no variant fully accounts for the association in any region. In the variant set results, CHRNB4 accounts for most of the signal with significant sets consisting of directly damaging variants. CHRNA6 explains most of the signal in the CHRNB3\CHRNA6 locus with significant sets indicating a regulatory role for CHRNA6. Significant sets in CYP2A6 involved directly damaging variants while the significant variant sets suggested a regulatory role for EGLN2. We found that multiple variants implicating multiple processes explain the signal. Some variants can be prioritized for functional follow-up. © The Author 2015. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: [email protected]
Recommended from our members
Genetic variation in the SIM1 locus is associated with erectile dysfunction.
Erectile dysfunction affects millions of men worldwide. Twin studies support the role of genetic risk factors underlying erectile dysfunction, but no specific genetic variants have been identified. We conducted a large-scale genome-wide association study of erectile dysfunction in 36,649 men in the multiethnic Kaiser Permanente Northern California Genetic Epidemiology Research in Adult Health and Aging cohort. We also undertook replication analyses in 222,358 men from the UK Biobank. In the discovery cohort, we identified a single locus (rs17185536-T) on chromosome 6 near the single-minded family basic helix-loop-helix transcription factor 1 (SIM1) gene that was significantly associated with the risk of erectile dysfunction (odds ratio = 1.26, P = 3.4 × 10-25). The association replicated in the UK Biobank sample (odds ratio = 1.25, P = 6.8 × 10-14), and the effect is independent of known erectile dysfunction risk factors, including body mass index (BMI). The risk locus resides on the same topologically associating domain as SIM1 and interacts with the SIM1 promoter, and the rs17185536-T risk allele showed differential enhancer activity. SIM1 is part of the leptin-melanocortin system, which has an established role in body weight homeostasis and sexual function. Because the variants associated with erectile dysfunction are not associated with differences in BMI, our findings suggest a mechanism that is specific to sexual function
Chapter Functional Annotation of Rare Genetic Variants
Genome-wide association studies have successfully identified a growing number of
common variants that robustly associate with a wide range of complex diseases and
phenotypes. In the majority of cases though, the variants are predicted to have small to
modest effect sizes, and, due to the technologies used, many of the signals discovered
so far may not be the causal loci. As rare variation studies begin to explore the lower
ranges of the allele frequency spectrum, using whole genome or whole exome
sequencing to capture a larger proportion of variants, we expect to find variants with a
more direct causal role in the phenotype(s) of interest. Interpreting possible functional
mechanisms linking variants with phenotypes will become increasingly important
Exome sequencing followed by large-scale genotyping suggests a limited role for moderately rare risk factors of strong effect in schizophrenia.
Schizophrenia is a severe psychiatric disorder with strong heritability and marked heterogeneity in symptoms, course, and treatment response. There is strong interest in identifying genetic risk factors that can help to elucidate the pathophysiology and that might result in the development of improved treatments. Linkage and genome-wide association studies (GWASs) suggest that the genetic basis of schizophrenia is heterogeneous. However, it remains unclear whether the underlying genetic variants are mostly moderately rare and can be identified by the genotyping of variants observed in sequenced cases in large follow-up cohorts or whether they will typically be much rarer and therefore more effectively identified by gene-based methods that seek to combine candidate variants. Here, we consider 166 persons who have schizophrenia or schizoaffective disorder and who have had either their genomes or their exomes sequenced to high coverage. From these data, we selected 5,155 variants that were further evaluated in an independent cohort of 2,617 cases and 1,800 controls. No single variant showed a study-wide significant association in the initial or follow-up cohorts. However, we identified a number of case-specific variants, some of which might be real risk factors for schizophrenia, and these can be readily interrogated in other data sets. Our results indicate that schizophrenia risk is unlikely to be predominantly influenced by variants just outside the range detectable by GWASs. Rather, multiple rarer genetic variants must contribute substantially to the predisposition to schizophrenia, suggesting that both very large sample sizes and gene-based association tests will be required for securely identifying genetic risk factors. © 2012 The American Society of Human Genetics
Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Caseâ Control Sequencing Studies
Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional singleâ marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burdenâ type approaches attempt to identify aggregation of RVs across caseâ control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for largeâ scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathwayâ level RV analysis results from a prostate cancer (PC) risk caseâ control sequencing study. Finally, we discuss potential extensions and future directions of this work.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134215/1/gepi21983.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134215/2/gepi21983_am.pd
Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Case-Control Sequencing Studies
Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional single-marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burden-type approaches attempt to identify aggregation of RVs across case-control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for large-scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathway-level RV analysis results from a prostate cancer (PC) risk case-control sequencing study. Finally, we discuss potential extensions and future directions of this work
Enhancing the discovery of rare disease variants through hierarchical modeling
Advances in next-generation sequencing technology are enabling researchers to capture a comprehensive picture of genomic variation across large numbers of individuals with unprecedented levels of efficiency. The main analytic challenge in disease mapping is how to mine the data for rare causal variants among a sea of neutral variation. To achieve this goal, investigators have proposed a number of methods that exploit biological knowledge. In this paper, I propose applying a Bayesian stochastic search variable selection algorithm in this context. My multivariate method is inspired by the combined multivariate and collapsing method. In this proposed method, however, I allow an arbitrary number of different sources of biological knowledge to inform the model as prior distributions in a two-level hierarchical model. This allows rare variants with similar prior distributions to share evidence of association. Using the 1000 Genomes Project single-nucleotide polymorphism data provided by Genetic Analysis Workshop 17, I show that through biologically informative prior distributions, some power can be gained over noninformative prior distributions
Nonparametric Bayes multiresolution testing for high-dimensional rare events
In a variety of application areas, there is interest in assessing evidence of
differences in the intensity of event realizations between groups. For example,
in cancer genomic studies collecting data on rare variants, the focus is on
assessing whether and how the variant profile changes with the disease subtype.
Motivated by this application, we develop multiresolution nonparametric Bayes
tests for differential mutation rates across groups. The multiresolution
approach yields fast and accurate detection of spatial clusters of rare
variants, and our nonparametric Bayes framework provides great flexibility for
modeling the intensities of rare variants. Some theoretical properties are also
assessed, including weak consistency of our Dirichlet Process-Poisson-Gamma
mixture over multiple resolutions. Simulation studies illustrate excellent
small sample properties relative to competitors, and we apply the method to
detect rare variants related to common variable immunodeficiency from whole
exome sequencing data on 215 patients and over 60,027 control subjects
- …