31 research outputs found

    Statistical models for predicting cell cycle genes using Random Forest method.

    No full text
    <p>(A) The ROC curves for 3 classification models that use TF-only, motif-only features or a combination of them as predictors. (B) The relative importance (measured as MDG, Mean Decrease in Gini coefficient) of TF features in the combined model (TF+Motif). (C) The relative importance of motif features in the combined model. (D) The change of prediction accuracy (measured as AUC scores) when remove the most important predictor from the full model one by one. Note that cell cycle genes in the training data are from data in Hela cells, and thus we use only TF binding data from the same cell line in our model.</p

    Schematic diagram of our analysis for predicting human cell cycle genes.

    No full text
    <p>The predictive model integrates three types of data from microarray, ChIP-seq experiments and computational TF binding motif analysis.</p

    Predicted cell cycle genes are more likely to interact with cell cycle partner in protein-protein interaction network.

    No full text
    <p>(A) the average number partners; (B) the average number of cell cycle partners; (C) the average percentage of cell cycle partners. Note all known cell cycle genes are excluded from the predicted cell cycle gene set. The P-values for difference in numbers of partners or cell cycle partners between two gene classes are calculated by Chi-squared test.</p

    Regulator scores of TFs on genes can discriminate cell cycle (CC) versus non-cell cycle (non-CC) genes.

    No full text
    <p>(A) Distributions of regulatory scores for CMYC and E2F1 are significantly different between CC and non-CC genes (P = 2e-55 and P = 1e-50, respectively). (B) The average signals of CMYC and E2F1 show similar distributions between CC and non-CC genes (P = 0.03 and P = 0.05, respectively) (C) The t-scores for CC versus non-CC genes calculated by comparing regulatory scores and average signals of TFs. SYDH, UTA and HAIB are the Lab IDs of a dataset.</p

    Prediction of cell cycle related promoters.

    No full text
    <p>Model is applied to ∼138,000 GENCODE annotated promoters to identify novel cell cycle genes of different types. (A) The number of cell cycle related genes identified the model when different threshold is used. The precision (1-FDR) is shown as the increasing grey line. (B) The percentage of different types of genes that are predicted to be cell cycle related at threshold of 0.7 (Prob>0.7). FDR: false discovery rate.</p

    Enrichment analyses of cohort-common DEGs.

    No full text
    <p>(A) Hierarchical clustering of gene expression of cohort-common and cohort-specific DEGs in both Asian and Caucasian RNA-seq studies. (B) 118 cohort-common DEGs enriched pathways. (C) Upstream regulators and their target networks enriched in 118 cohort-common DEGs.</p

    Cohort-common and cohort-specific detections of DEGs.

    No full text
    <p>(A) Q-Q plot of residuals of one randomly selected gene from 4418 genes having significant differential tumor-normal log ratios of gene expression between populations. (B) Comparison of detections from Asian and Caucasian RNA-seq studies. (C) Comparison of detections from Asian and Caucasian microarray studies. (D) Comparison of discovery rates from population-common and population-specific analyses. (E) Venn diagram of the top 300 DEGs from Asian and Caucasian RNA-seq studies. (F) Venn diagram of the top 300 DEGs from all RNA-seq and microarray studies.</p

    Experimentally-Derived Fibroblast Gene Signatures Identify Molecular Pathways Associated with Distinct Subsets of Systemic Sclerosis Patients in Three Independent Cohorts

    Get PDF
    <div><p>Genome-wide expression profiling in systemic sclerosis (SSc) has identified four ‘intrinsic’ subsets of disease (fibroproliferative, inflammatory, limited, and normal-like), each of which shows deregulation of distinct signaling pathways; however, the full set of pathways contributing to this differential gene expression has not been fully elucidated. Here we examine experimentally derived gene expression signatures in dermal fibroblasts for thirteen different signaling pathways implicated in SSc pathogenesis. These data show distinct and overlapping sets of genes induced by each pathway, allowing for a better understanding of the molecular relationship between profibrotic and immune signaling networks. Pathway-specific gene signatures were analyzed across a compendium of microarray datasets consisting of skin biopsies from three independent cohorts representing 80 SSc patients, 4 morphea, and 26 controls. IFNα signaling showed a strong association with early disease, while TGFβ signaling spanned the fibroproliferative and inflammatory subsets, was associated with worse MRSS, and was higher in lesional than non-lesional skin. The fibroproliferative subset was most strongly associated with PDGF signaling, while the inflammatory subset demonstrated strong activation of innate immune pathways including TLR signaling upstream of NF-κB. The limited and normal-like subsets did not show associations with fibrotic and inflammatory mediators such as TGFβ and TNFα. The normal-like subset showed high expression of genes associated with lipid signaling, which was absent in the inflammatory and limited subsets. Together, these data suggest a model by which IFNα is involved in early disease pathology, and disease severity is associated with active TGFβ signaling.</p></div

    Hierarchical clustering recreates intrinsic subsets.

    No full text
    <p>Hierarchical clustering of the ComBat-merged MPH dataset recreates clear normal-like, fibroproliferative, inflammatory, and limited subsets. Clustering was performed on 2316 probes covering 2189 genes at an FDR of 0.65%, chosen based upon their consistent expression within an individual patient, along with high variance between patients. The array tree is color coded to indicate new intrinsic subset designations (yellow = limited, green = normal-like, purple = inflammatory, red = fibroproliferative, and black = unassigned). Below the array tree, hash marks are used to indicate the original subset designation (TOP: green = normal-like, red = fibroproliferative, purple = inflammatory, yellow = limited, black = unassigned), the dataset of origin (MIDDLE: blue = Milano, green = Pendergrass, red = Hinchcliff), and the clinical diagnosis (BOTTOM: green = normal, red = diffuse scleroderma, yellow = limited scleroderma, black = morphea or eosinophilic fasciitis). Black bars indicate genes that clustered together hierarchically, with the most highly represented GO terms listed alongside each cluster. </p
    corecore