44 research outputs found

    Known pioneer TF subfamilies strongly enrich in predicted chromatin accessibility regulators.

    No full text
    <p>Shown in grey is a scaled cumulative distribution plot for subfamily level CAR ranks of subfamilies not annotated as pioneers in Iwafuchi-Doi et al. [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.ref003" target="_blank">3</a>]. In black, we see the cumulative number of pioneer subfamilies that reached at least a given CAR rank. Six out of eight subfamilies show a low CAR rank, which is more than three times as many as one would expect on average when sampling from non-pioneer subfamilies.</p

    Performance of pathway enrichment methods for blood lipid traits and Crohn’s disease.

    No full text
    <p>Displayed is the mean area under the precision-recall curve (AUC) for pathways identified using <i>Pascal</i>, a standard hypergeometric test at various gene score threshold levels, and a rank-sum test (vertical bars show the standard error). We show results for the max gene scores (sum gene score results are similar, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.s005" target="_blank">S5 Fig</a>). a) Results for four blood lipid traits. The gold standard pathway list was defined as all pathways that show a significance level below 5×10<sup>−6</sup> for <i>any</i> of the tested threshold parameters for hypergeometric tests in the largest study of lipid traits to date[<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.ref023" target="_blank">23</a>]. The significance level of 5×10<sup>−6</sup> corresponds to the Bonferroni corrected, genome-wide significance threshold at the 0.5% level for a single method. For each phenotype, error bars denote the standard error computed from three independent subsamples of the <i>CoLaus</i> study (including 1500 individuals each). We see good overall performance of <i>Pascal</i> pathway scores, whereas results for discrete gene sets vary widely with the particular choice for the threshold parameter of hypergeometric test. b) Results for Crohn’s disease using the same approach as in (a). A reference standard pathway list was defined as in (a) using the largest study of Crohn’s disease traits to date[<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.ref031" target="_blank">31</a>]. We observe that the chi-squared strategy performs at least as well as all other strategies in this setting, whereas performance of the hypergeometric testing strategy varies.</p

    Power of pathway scoring methods across diverse traits and diseases.

    No full text
    <p>Bar heights represent the number of pathways found to be significant after Bonferroni-correction. Within a given trait group, results are aggregated for all tested GWAS studies. 65 GWAS had at least one significant pathway in one of the tested methods. For each GWAS, the raw number of significant pathways was divided by the number of pathways found by the best performing method. This was done in order to avoid that a few studies with many emerging pathways dominate. We show results for the MOCS gene scores (SOCS gene score results are similar, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.s006" target="_blank">S6 Fig</a>). (a) Results are aggregated over all trait groups. (b) Results for different trait groups.</p

    Comparing efficiency between VEGAS and <i>Pascal</i>.

    No full text
    <p>a) Run times of VEGAS and <i>Pascal</i> (both options). Gene scores were computed on two GWAS (one HapMap imputed[<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.ref023" target="_blank">23</a>], one 1KG imputed[<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.ref022" target="_blank">22</a>,<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004714#pcbi.1004714.ref025" target="_blank">25</a>]) for 18,132 genes on a single core. <i>Pascal</i> was compared to VEGAS for the HapMap imputed study and VEGAS2 for the 1KG-imputed study. For this plot, VEGAS and VEGAS2 were used with the default maximum number of Monte Carlo samples of 10<sup>6</sup> for both studies and additionally with 10<sup>8</sup> Monte Carlo samples for the HapMap imputed study. b) Scatter plot of -log<sub>10</sub>-transformed gene p-values for the sum gene scores obtained by VEGAS and <i>Pascal</i>, respectively. P-values above 10<sup>−6</sup> are in excellent concordance. Below this value VEGAS could not give precise estimates, since it was run with the maximal number of Monte Carlo samples set to 10<sup>8</sup>.</p

    Predicted pioneer factors.

    No full text
    <p>Shown are the CAR ranks of factor subfamilies that were discussed in the main text. These included subfamilies labelled pioneers in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.ref003" target="_blank">3</a>] and consequently used as a member of the true positive set used in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.g005" target="_blank">Fig 5</a> (these subfamilies are set in bold face). Additionally, subfamilies are shown that are predicted to be CARs and for which there exist limited literature evidence for pioneer activity. For each subfamily, the top-scoring gene among all genes in the subfamily is mentioned. A complete table for all tested subfamilies is given in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.s002" target="_blank">S1 Table</a>.</p

    Enrichment of bound motifs for a given TF and its subfamily members.

    No full text
    <p>All TF ChIP-seq experiments from the Myers-lab released as part of the ENCODE project were downloaded. For each TF ChIP-seq experiment we also obtained the corresponding TF motif from the HOCOMOCO database [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.ref025" target="_blank">25</a>]. For a given ChIP-seq experiment, we looked at the processed DHS peaks in the same cell line. We partitioned DHS peaks into two groups depending on whether they were bound by the TF (overlap with a ChIP-seq peak) or not. We then calculated both the fraction of bound and unbound DHS peaks containing a given motif. The enrichment of bound motifs was defined as the ratio of these two fractions. Results are shown from left to right for: the motifs of the TFs that were assayed in the corresponding ChIP-seq experiments (Correct TF motifs), motifs of other TFs from the same subfamily (TF subfamily motifs), and randomly sampled motifs (Random motifs). During sampling, each motif was sampled as often as the number of ChiP-seq experiment available for that motif. We see strong enrichment of TF motifs in ChIP-seq peaks of the TF as well as its subfamily members.</p

    Strong associations between GR-like receptor motif and glucocorticoid response genes.

    No full text
    <p>a) Association results for motif accessibility of the TF <i>NR3C1</i>, which belongs to the GR-like receptor subfamily, and mRNA expression across all genes. -Log10 transformed p-values are shown in a QQ-plot. <i>NR3C1</i> motif accessibility shows strong association with mRNA expression of three glucocorticoid response genes (orange), but only weak association with expression of <i>NR3C1</i> and other GR-like receptor TFs (green). In this example, motif accessibility is strongly associated with downstream gene expression, but only weakly with expression of the TF itself. b) The network shows functional relationships among the GR-like receptor TFs (green) and the three most strongly associated genes (orange), which are all glucocorticoid response genes. The strength of links shows confidence in functional relationship given in the <i>STRING</i> database. We see numerous links between the downstream glucocorticoid response genes and the GR-like receptor TFs in the STRING database, confirming their functional relatedness, where <i>NR3C1</i> has the most links to associated genes.</p

    Mixed model approach for identification of chromatin accessibility regulators.

    No full text
    <p>For a TF binding motif, we search for all its instances in the genome. For each cell line, we calculate the accessibility score by counting how many motif instances are found in the open chromatin fraction of the genome. After further normalization, these accessibility scores are compared to gene expression values for all genes via regression (Methods). To account for confounding, we use mixed model regression, where an additional random component is used with the same covariance structure as the gene expression matrix. To be considered a CAR candidate, motif accessibility of a TF must show strong association (low p-value) with the expression of the corresponding TF gene compared to other genes. The gene-level <i>CAR rank</i> of a TF is defined as the rank of its association p-value among the p-values for all genes.</p

    Association between motif accessibility and mRNA expression for the putative chromatin accessibility regulator <i>EBF1</i>.

    No full text
    <p>Three different regression models (a-c) were used to compute association p-values between the accessibility of a given TF motif (here <i>EBF1</i>) and mRNA expression for each of the assayed 15K protein-coding genes. Results are visualized as qq-plots showing the -log10 transformed p-values. (a) Association p-values obtained using standard linear regression. Due to confounding, p-values are strongly inflated and <i>EBF1</i> motif accessibility does not show strong association with <i>EBF1</i> expression compared to other genes. (b) The linear mixed model (LMM) successfully corrects for confounding, with most p-values following the null distribution as expected. The association between <i>EBF1</i> motif accessibility and <i>EBF1</i> expression now ranks second among all genes and first among all TFs, although it does not pass the Bonferroni significance threshold. (c) Additionally controlling for the first principal component of the motif accessibility matrix corrects for a strong batch effect (Methods), which further improves the signal. Using this approach, <i>EBF1</i> motif accessibility showed the strongest association precisely with <i>EBF1</i> expression (i.e., the gene-level CAR rank equals one), suggesting that <i>EBF1</i> may be a CAR, in agreement with the literature [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.ref022" target="_blank">22</a>]. As a further illustration for the improvements achieved using the mixed model approach <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.s003" target="_blank">S1 Fig</a> shows the analogous plot for FOXA1, the first discovered pioneer factor [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.ref004" target="_blank">4</a>,<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.ref005" target="_blank">5</a>].</p

    Method comparison across all subfamilies.

    No full text
    <p>Cumulative distribution of <i>CAR</i> ranks at the subfamily level for the 147 tested subfamilies using the three different modelling strategies: ‘standard linear regression’, ‘mixed model regression’ and ‘mixed model PC corrected’ (see legend of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#pcbi.1005311.g002" target="_blank">Fig 2</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005311#sec009" target="_blank">Methods</a>). We see strong enrichment of low ranks implying deviation from the null hypothesis. The linear mixed modelling increases enrichment of low CAR ranks.</p
    corecore