30 research outputs found
Robust Linear Models for Cis-eQTL Analysis
<div><p>Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.</p></div
Concordance (number and proportion of mRNAs with at least one eQTL SNP) between the conventional and robust models (Myers et al. data set [25].
<p>Concordance (number and proportion of mRNAs with at least one eQTL SNP) between the conventional and robust models (Myers et al. data set [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127882#pone.0127882.ref025" target="_blank">25</a>].</p
Power analysis results (mixture contamination model).
<p>A) Power as a function of contamination proportion. B) Power as a function of study size. C) Power as a function of the genetic effect size. (Simulation parameters: 10000 samples; A, B and D: N = 100; B, C and D:<i>Ï€</i> = 0.95)</p
P-value correspondence in Myers <i>et al</i>. data set [25].
<p>Scatter plot of −<i>log</i><sub>10</sub>(p-values) from Myers <i>et al</i>. data set [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127882#pone.0127882.ref025" target="_blank">25</a>]. (Key: green = significant in both models, red = significant in the conventional model only, blue = significant in the robust model only, data from points marked with black squares are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127882#pone.0127882.g005" target="_blank">Fig 5</a>)</p
P-value correspondence in Grundberg <i>et al</i>. data set [26].
<p>Scatter plot of −<i>log</i><sub>10</sub>(p-values) from MuTHER data set [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127882#pone.0127882.ref026" target="_blank">26</a>]. (Key: green = significant in both models, red = significant in the conventional model only, blue = significant in the robust model only, data from points marked with black squares are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127882#pone.0127882.s002" target="_blank">S1 Fig</a>).</p
Power analysis results (empirical residuals from robust model fit).
<p>A) Residuals from a random sample of eQTL models. B) Residuals from a random sample from models found to be significant only in the robust eQTL model. (‘cont’ = residual from robust model fit of Myers <i>et al</i>. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0127882#pone.0127882.ref025" target="_blank">25</a>] data set; ‘no cont’ = Gaussian residuals.)</p
Results from comparative analysis of Myers <i>et al</i>. data set [25].
<p>SNP effect size estimates and standard errors for eQTLs significant in both models (A, D), in the robust model only (B, E), and in the linear model only (C, F).</p
Power analysis results (heavy-tailed).
<p>A) Power as a function of degrees of freedom in the student t-distribution. B) Power as a function of study size. C) Power as a function of the genetic effect size. (Simulation parameters: 10000 samples, A-B, D: N = 100, B-D:<i>df</i> = 4)</p
Integrative Transcriptomic and Metabonomic Molecular Profiling of Colonic Mucosal Biopsies Indicates a Unique Molecular Phenotype for Ulcerative Colitis
Ulcerative
colitis is the most prevailing entity of several disorders
under the umbrella term inflammatory bowel disease, with potentially
serious symptoms and devastating consequences for affected patients.
The exact molecular etiology of ulcerative colitis is not yet revealed.
In this study, we characterized the molecular phenotype of ulcerative
colitis through transcriptomic and metabonomic profiling of colonic
mucosal biopsies from patients and controls. We have characterized
the extent to which metabonomic and transcriptomic molecular phenotypes
are associated with ulcerative colitis versus controls and other disease-related
phenotypes such as steroid dependency and age at diagnosis, to determine
if there is evidence of enrichment of differential expression in candidate
genes from genome-wide association studies and if there are particular
pathways influenced by disease-associated genes. Both transcriptomic
and metabonomic data have previously been shown to predict the clinical
course of ulcerative colitis and related clinical phenotypes, indicating
that molecular phenotypes reveal molecular changes associated with
the disease. Our analyses indicate that variables of both transcriptomics
and metabonomics are associated with disease case and control status,
that a large proportion of transcripts are associated with at least
one metabolite in mucosal colonic biopsies, and that multiple pathways
are connected to disease-related metabolites and transcripts
Additional file 2 of Determining breast cancer histological grade from RNA-sequencing data
Includes lists of genes and coefficients of each model. (XLSX 1361 kb