Search CORE

1,062 research outputs found

Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Author: Neil Lawrence
Nicolo Fusi
Oliver Stegle
Publication venue
Publication date: 02/06/2011
Field of study

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. 

Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an
eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation. 

We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies

Nature Precedings

RNA-Seq optimization with eQTL gold standards.

Author: Arking Dan E
Ashar Foram N
Bader Joel S
Ellis Shannon E
Gupta Simone
West Andrew B
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

BackgroundRNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking.ResultsTo address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis.ConclusionAs each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits

Author: Barata Llilda
Borecki Ingrid B
Czajkowski Jacek
et al
Feitosa Mary F
Heath Andrew C
Madden Pamela A.F.
Rao D.C.
Rice Treva
Sung Yun Ju
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Digital Commons@Becker

A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies

Author: AL Price
Aviv Regev
B Stranger
BEE Stranger
D Balding
DJC MacKay
DJC Mackay
E Lander
EE Schadt
EN Smith
G Gibson
HM Kang
J Reimand
J Winn
John Winn
JT Leek
Leopold Parts
M Jordan
M Rattray
O Stegle
Oliver Stegle
RB Brem
RB Brem
RB Williams
Richard Durbin
RM Neal
RSS Spielman
S Biswas
T Barrett
T Pastinen
V Emilsson
V Plagnol
Y Chen
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies.

Author: A Myers
A Nica
A Price
BE Stranger
C Lippert
D Balding
D Locke
E Schadt
EN Smith
G Churchill
H Kang
HM Kang
HM Kang
J Listgarten
J Pickrell
J Yu
JT Leek
Matthew Stephens
MC Teixeira
MI McCarthy
Neil D. Lawrence
Nicoló Fusi
O Stegle
O Stegle
Oliver Stegle
R Breitling
RB Brem
V Plagnol
WE Johnson
X Gan
Publication venue: PLoS Comput Biol
Publication date: 01/01/2012
Field of study

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der Universität Tübingen

Apollo (Cambridge)

White Rose Research Online

MPG.PuRe

FigShare

LIMIX: genetic analysis of multiple traits

Author: Casale F.P.
Lippert C.
Rakitsch B.
Stegle O.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 22/05/2014
Field of study

Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general. However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to exibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source implementation of LIMIX is freely available at: https://github.com/PMBio/limix

MDC Repository

Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing

Author: Wen Xiaoquan
Publication venue
Publication date: 23/02/2015
Field of study

We consider the problems of hypothesis testing and model comparison under a flexible Bayesian linear regression model whose formulation is closely connected with the linear mixed effect model and the parametric models for SNP set analysis in genetic association studies. We derive a class of analytic approximate Bayes factors and illustrate their connections with a variety of frequentist test statistics, including the Wald statistic and the variance component score statistic. Taking advantage of Bayesian model averaging and hierarchical modeling, we demonstrate some distinct advantages and flexibilities in the approaches utilizing the derived Bayes factors in the context of genetic association studies. We demonstrate our proposed methods using real or simulated numerical examples in applications of single SNP association testing, multi-locus fine-mapping and SNP set association testing

arXiv.org e-Print Archive

CiteSeerX