58 research outputs found
WhichTF identifies cell-type-specific functionally important TFs in diverse cell types.
This is an extended version of Table 1 with the additional methods included in the comparison. The top 5 identified TFs for B-, T-, heart, and brain cells are shown for eight methods: MEME-ChIP, regulatory genomics toolbox (RGT) motif enrichment tool, HOMER (enrichment for known motifs), HOMER (de novo motif discovery followed by similarity search to known motifs), LOLA, cisTarget, PRISM conserved binding site enrichment, and WhichTF. Here, -log10(P) denotes the statistical significance (negative log 10 p-value) of the TFBS enrichment; -log10(CP) denotes the statistical significance (conditional p-value, conditioned on the TFs ranked above, Methods); and PMID represents the PubMed ID. (XLSX)</p
Number of known and available TF genes in the WhichTF reference dataset.
The number of known and available TF genes in the WhichTF reference dataset across major TF families based on Lambert et al, 2018 are shown across the 12 largest TF families. (TIF)</p
WhichTF robustness analysis.
(a) WhichTF ranking is robust to input region sub-sampling. The top 5 identified TFs are shown for human B-cells, T-cells, heart cells, and brain cells (sample name and the corresponding ENCODE accession ID are shown in the “Sample” column) across the different number of regions in the input files (70%, 80%, 90%, and 100%). The 100% corresponds to the original input file. For the other ones, we subsampled the elements in the BED file based on the SCORE column before applying WhichTF. (b) WhichTF ranking is robust to the lengths of the input region. The top 5 identified TFs are shown for human B-cells, T-cells, heart cells, and brain cells (sample name and the corresponding ENCODE accession ID are shown in the “Sample” column) across different maximum lengths of the regions in the input files (200 bp, 500 bp, 1000 bp, 2000 bp, and original). The ‘original’ correspond to the original input file and for the other ones, we trimmed, if needed, each element in the BED while preserving its midpoint before applying WhichTF. (c) WhichTF ranking is robust to the number of top enriched terms it uses. The top 5 identified TFs are shown for B-cells, T-cells, heart cells, and brain cells (sample name and the corresponding ENCODE accession ID are shown in the “Sample” column) across different numbers of top enriched ontology terms (50, 75, 90, 100 [default], 110, 125, and 150). The ‘100 [default]’ corresponds to the default parameter configuration of WhichTF (see Methods). (XLSX)</p
The update summary of GREAT v4.0.4 ontologies.
Ensembl genes is a flat ontology defined from the set of genes with at least one meaningful annotation in gene ontology (Methods). (XLSX)</p
Differential WhichTF analysis identifies differentially dominant TFs compared to immediate progenitor cells along experimentally derived human mesoderm development pathways from ATAC-seq data.
The top 5 TFs with their corresponding statistical significance, negative log10 conditional probabilities, -log10(CP), are shown. The results from a MEME-ChIP differential analysis are shown for comparison. The importance and PubMed ID (PMID) columns indicate whether (i) existing literature supports the identified TFs (confirmed); (ii) literature reports closely related factors, such as co-factors and functionally related family members, or the identified TFs in a related context (suggestive); or (iii) novel. Differential WhichTF and differential MEME-ChIP are found to produce very different predictions. The abundance of confirmed WhichTF predictions (here and in previous figures) makes suggestive and novel predictions more attractive.</p
WhichTF partial scores capture cell-type-specific TF functions.
We show for each WhichTF top predicted TF in 4 different cellular contexts their highest supporting partial scores (Annotated column). Nearly 90% of these terms are annotated to the TF gene itself based on closed-loop validation (Methods). Here ENC indicates ENCODE accession id and MP is a prefix for MGI mammalian phenotype ontology. (XLSX)</p
Gene expression of the top identified TF genes from the differential WhichTF analysis applied on B and T-cell DNase-seq data.
Gene expression of the top identified differential TF genes, SPI-B and RUNX3, are shown (horizontal axis) across diverse lymphoid cell types (vertical axis) for up to four healthy donors. SPIB has specific expression in B-cells, whereas RUNX3 has elevated expression in T-cells. (TIF)</p
WhichTF captures biological similarities and dissimilarities of TF-mediated transcriptional programs.
The WhichTF score vectors are projected to a t-SNE plot for DNase-seq data tracks of 90 samples across 7 cell types, including GM12878 (number of samples, n = 7), B cells (n = 4), T cells (n = 8), Heart (n = 34), Left ventricle (n = 6), Right ventricle (n = 2), and Brain (n = 29). Reassuringly, samples from the same cell types are projected together in the low-dimensional space. The samples analyzed in Table 1 are annotated here with arrows. The raw data used in the figure is available in S2 Data.</p
The rank of GATA family members in heart and erythrocyte progenitor cells.
The ranks of GATA family members in heart and erythrocyte progenitor cells from the human ENCODE dataset are shown. (XLSX)</p
- …