Search CORE

7,144 research outputs found

RNA-Seq optimization with eQTL gold standards.

Author: Arking Dan E
Ashar Foram N
Bader Joel S
Ellis Shannon E
Gupta Simone
West Andrew B
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

BackgroundRNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking.ResultsTo address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis.ConclusionAs each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Deep Sequencing of Three Loci Implicated in Large-Scale Genome-Wide Association Study Smoking Meta-Analyses

Author: Aberg Karolina
Adkins Daniel
Clark Shaunna L.
Collins Ann
Copeland William
Crowley James
Elizabeth J. Costello
Gao Guimin
Hillard Christopher
Kumar Gaurav
Maes Hermine
McClay Joseph
Nerella Sri
Peterson Roseann
Quakenbush Corey
Shabalin Andrey
Silberg Judy
Sullivan Patrick
van den Oord Edwin J.
Xie Linying
Publication venue
Publication date: 01/01/2016
Field of study

Genome-wide association study meta-analyses have robustly implicated three loci that affect susceptibility for smoking: CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6 and EGLN2\CYP2A6. Functional follow-up studies of these loci are needed to provide insight into biological mechanisms. However, these efforts have been hampered by a lack of knowledge about the specific causal variant(s) involved. In this study, we prioritized variants in terms of the likelihood they account for the reported associations. We employed targeted capture of the CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6, and EGLN2\CYP2A6 loci and flanking regions followed by next-generation deep sequencing (mean coverage 78×) to capture genomic variation in 363 individuals. We performed single locus tests to determine if any single variant accounts for the association, and examined if sets of (rare) variants that overlapped with biologically meaningful annotations account for the associations. In total, we investigated 963 variants, of which 71.1% were rare (minor allele frequency < 0.01), 6.02% were insertion/deletions, and 51.7% were catalogued in dbSNP141. The single variant results showed that no variant fully accounts for the association in any region. In the variant set results, CHRNB4 accounts for most of the signal with significant sets consisting of directly damaging variants. CHRNA6 explains most of the signal in the CHRNB3\CHRNA6 locus with significant sets indicating a regulatory role for CHRNA6. Significant sets in CYP2A6 involved directly damaging variants while the significant variant sets suggested a regulatory role for EGLN2. We found that multiple variants implicating multiple processes explain the signal. Some variants can be prioritized for functional follow-up. © The Author 2015. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: [email protected]

Carolina Digital Repository

Recommended from our members

Genetic variation in the SIM1 locus is associated with erectile dysfunction.

Author: Ahituv Nadav
Hoffmann Thomas J
Hotaling James M
Jarvik Gail P
Jorgenson Eric
Matharu Navneet
Palmer Melody R
Shan Jun
Thai Khanh K
Van Den Eeden Stephen K
Wessells Hunter
Yin Jie
Zhou Xujia
Publication venue: eScholarship, University of California
Publication date: 01/10/2018
Field of study

Erectile dysfunction affects millions of men worldwide. Twin studies support the role of genetic risk factors underlying erectile dysfunction, but no specific genetic variants have been identified. We conducted a large-scale genome-wide association study of erectile dysfunction in 36,649 men in the multiethnic Kaiser Permanente Northern California Genetic Epidemiology Research in Adult Health and Aging cohort. We also undertook replication analyses in 222,358 men from the UK Biobank. In the discovery cohort, we identified a single locus (rs17185536-T) on chromosome 6 near the single-minded family basic helix-loop-helix transcription factor 1 (SIM1) gene that was significantly associated with the risk of erectile dysfunction (odds ratio = 1.26, P = 3.4 × 10-25). The association replicated in the UK Biobank sample (odds ratio = 1.25, P = 6.8 × 10-14), and the effect is independent of known erectile dysfunction risk factors, including body mass index (BMI). The risk locus resides on the same topologically associating domain as SIM1 and interacts with the SIM1 promoter, and the rs17185536-T risk allele showed differential enhancer activity. SIM1 is part of the leptin-melanocortin system, which has an established role in body weight homeostasis and sexual function. Because the variants associated with erectile dysfunction are not associated with differences in BMI, our findings suggest a mechanism that is specific to sexual function

eScholarship - University of California

Chapter Functional Annotation of Rare Genetic Variants

Author: Flicek Paul
Ritchie Graham
Publication venue: Springer Nature
Publication date: 02/06/2021
Field of study

Genome-wide association studies have successfully identified a growing number of common variants that robustly associate with a wide range of complex diseases and phenotypes. In the majority of cases though, the variants are predicted to have small to modest effect sizes, and, due to the technologies used, many of the signals discovered so far may not be the causal loci. As rare variation studies begin to explore the lower ranges of the allele frequency spectrum, using whole genome or whole exome sequencing to capture a larger proportion of variants, we expect to find variants with a more direct causal role in the phenotype(s) of interest. Interpreting possible functional mechanisms linking variants with phenotypes will become increasingly important

Directory of Open Access Books (DOAB)

Exome sequencing followed by large-scale genotyping suggests a limited role for moderately rare risk factors of strong effect in schizophrenia.

Author: Campbell CR
Cirulli ET
Ge D
Gennarelli M
Goldstein DB
Gumbs CE
Hallikainen T
He M
Heinzen EL
Hong L
Levy DL
Maia JM
McEvoy JP
Meltzer HY
Need AC
Putkonen A
Repo-Tiihonen E
Rosenquist P
Shianna KV
Tiihonen J
Zhao Q
Publication venue: 'Elsevier BV'
Publication date: 10/08/2012
Field of study

Schizophrenia is a severe psychiatric disorder with strong heritability and marked heterogeneity in symptoms, course, and treatment response. There is strong interest in identifying genetic risk factors that can help to elucidate the pathophysiology and that might result in the development of improved treatments. Linkage and genome-wide association studies (GWASs) suggest that the genetic basis of schizophrenia is heterogeneous. However, it remains unclear whether the underlying genetic variants are mostly moderately rare and can be identified by the genotyping of variants observed in sequenced cases in large follow-up cohorts or whether they will typically be much rarer and therefore more effectively identified by gene-based methods that seek to combine candidate variants. Here, we consider 166 persons who have schizophrenia or schizoaffective disorder and who have had either their genomes or their exomes sequenced to high coverage. From these data, we selected 5,155 variants that were further evaluated in an independent cohort of 2,617 cases and 1,800 controls. No single variant showed a study-wide significant association in the initial or follow-up cohorts. However, we identified a number of case-specific variants, some of which might be real risk factors for schizophrenia, and these can be readily interrogated in other data sets. Our results indicate that schizophrenia risk is unlikely to be predominantly influenced by variants just outside the range detectable by GWASs. Rather, multiple rarer genetic variants must contribute substantially to the predisposition to schizophrenia, suggesting that both very large sample sizes and gene-based association tests will be required for securely identifying genetic risk factors. © 2012 The American Society of Human Genetics

Elsevier - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Caseâ Control Sequencing Studies

Author: Albert
Auwera
Bansal
Baragatti
Cirulli
DePristo
Dering
Gelfand
Hastings
Huang
Jeffreys
Johnson
Joshi-Tope
Kanehisa
Kang
Lee
Lee
Lee
Leon-Novelo
Li
Liang
Liu
Liu
Logsdon
McKenna
Neale
Nelson
O'Hara
Peltola
Pritchard
Quintana
Quintana
Shi
Tanner
Thomson
Wilson
Wu
Wu
Yang
Zellner
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional singleâ marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burdenâ type approaches attempt to identify aggregation of RVs across caseâ control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for largeâ scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathwayâ level RV analysis results from a prostate cancer (PC) risk caseâ control sequencing study. Finally, we discuss potential extensions and future directions of this work.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134215/1/gepi21983.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134215/2/gepi21983_am.pd

Crossref

PubMed Central

Carolina Digital Repository

Institute of Cancer Research Repository

University of Melbourne Institutional Repository

Deep Blue Documents at the University of Michigan

Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Case-Control Sequencing Studies

Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional single-marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burden-type approaches attempt to identify aggregation of RVs across case-control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for large-scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathway-level RV analysis results from a prostate cancer (PC) risk case-control sequencing study. Finally, we discuss potential extensions and future directions of this work

Carolina Digital Repository

Enhancing the discovery of rare disease variants through hierarchical modeling

Author: AL Price
B Li
BE Madsen
C Dering
Gary K Chen
GK Chen
H Zhou
J Besag
LA Almasy
R Tibshirani
RE Kass
SP Dickson
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Advances in next-generation sequencing technology are enabling researchers to capture a comprehensive picture of genomic variation across large numbers of individuals with unprecedented levels of efficiency. The main analytic challenge in disease mapping is how to mine the data for rare causal variants among a sea of neutral variation. To achieve this goal, investigators have proposed a number of methods that exploit biological knowledge. In this paper, I propose applying a Bayesian stochastic search variable selection algorithm in this context. My multivariate method is inspired by the combined multivariate and collapsing method. In this proposed method, however, I allow an arbitrary number of different sources of biological knowledge to inform the model as prior distributions in a two-level hierarchical model. This allows rare variants with similar prior distributions to share evidence of association. Using the 1000 Genomes Project single-nucleotide polymorphism data provided by Genetic Analysis Workshop 17, I show that through biologically informative prior distributions, some power can be gained over noninformative prior distributions

Crossref

Springer - Publisher Connector

PubMed Central

Nonparametric Bayes multiresolution testing for high-dimensional rare events

Author: Banerjee Sayantan
Datta Jyotishka
Dunson David B.
Publication venue
Publication date: 19/01/2024
Field of study

In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and how the variant profile changes with the disease subtype. Motivated by this application, we develop multiresolution nonparametric Bayes tests for differential mutation rates across groups. The multiresolution approach yields fast and accurate detection of spatial clusters of rare variants, and our nonparametric Bayes framework provides great flexibility for modeling the intensities of rare variants. Some theoretical properties are also assessed, including weak consistency of our Dirichlet Process-Poisson-Gamma mixture over multiple resolutions. Simulation studies illustrate excellent small sample properties relative to competitors, and we apply the method to detect rare variants related to common variable immunodeficiency from whole exome sequencing data on 215 patients and over 60,027 control subjects

arXiv.org e-Print Archive