155 research outputs found
Predicting gene ontology from a global meta-analysis of 1-color microarray experiments
<p>Abstract</p> <p>Background</p> <p>Global meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance.</p> <p>Results</p> <p>13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision.</p> <p>Conclusions</p> <p>Of the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.</p
Systematic classification of non-coding RNAs by epigenomic similarity
BACKGROUND: Even though only 1.5% of the human genome is translated into proteins, recent reports indicate that most of it is transcribed into non-coding RNAs (ncRNAs), which are becoming the subject of increased scientific interest. We hypothesized that examining how different classes of ncRNAs co-localized with annotated epigenomic elements could help understand the functions, regulatory mechanisms, and relationships among ncRNA families. RESULTS: We examined 15 different ncRNA classes for statistically significant genomic co-localizations with cell type-specific chromatin segmentation states, transcription factor binding sites (TFBSs), and histone modification marks using GenomeRunner (http://www.genomerunner.org). P-values were obtained using a Chi-square test and corrected for multiple testing using the Benjamini-Hochberg procedure. We clustered and visualized the ncRNA classes by the strength of their statistical enrichments and depletions. We found piwi-interacting RNAs (piRNAs) to be depleted in regions containing activating histone modification marks, such as H3K4 mono-, di- and trimethylation, H3K27 acetylation, as well as certain TFBSs. piRNAs were further depleted in active promoters, weak transcription, and transcription elongation regions, and enriched in repressed and heterochromatic regions. Conversely, transfer RNAs (tRNAs) were depleted in heterochromatin regions and strongly enriched in regions containing activating H3K4 di- and trimethylation marks, H2az histone variant, and a variety of TFBSs. Interestingly, regions containing CTCF insulator protein binding sites were associated with tRNAs. tRNAs were also enriched in the active, weak and poised promoters and, surprisingly, in regions with repetitive/copy number variations. CONCLUSIONS: Searching for statistically significant associations between ncRNA classes and epigenomic elements permits detection of potential functional and/or regulatory relationships among ncRNA classes, and suggests cell type-specific biological roles of ncRNAs
Proceedings of the 2015 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference
Proceedings of the 2012 MidSouth computational biology and bioinformatics society (MCBIOS) conference
PDGFRα signaling drives adipose tissue fibrosis by targeting progenitor cell plasticity
Fibrosis is a common disease process in which profibrotic cells disturb organ function by secreting disorganized extracellular matrix (ECM). Adipose tissue fibrosis occurs during obesity and is associated with metabolic dysfunction, but how profibrotic cells originate is still being elucidated. Here, we use a developmental model to investigate perivascular cells in white adipose tissue (WAT) and their potential to cause organ fibrosis. We show that a Nestin-Cre transgene targets perivascular cells (adventitial cells and pericyte-like cells) in WAT, and Nestin-GFP specifically labels pericyte-like cells. Activation of PDGFRα signaling in perivascular cells causes them to transition into ECM-synthesizing profibrotic cells. Before this transition occurs, PDGFRα signaling up-regulates mTOR signaling and ribosome biogenesis pathways and perturbs the expression of a network of epigenetically imprinted genes that have been implicated in cell growth and tissue homeostasis. Isolated Nestin-GFP+ cells differentiate into adipocytes ex vivo and form WAT when transplanted into recipient mice. However, PDGFRα signaling opposes adipogenesis and generates profibrotic cells instead, which leads to fibrotic WAT in transplant experiments. These results identify perivascular cells as fibro/adipogenic progenitors in WAT and show that PDGFRα targets progenitor cell plasticity as a profibrotic mechanism
From microarray to biology: an integrated experimental, statistical and in silico analysis of how the extracellular matrix modulates the phenotype of cancer cells
A statistically robust and biologically-based approach for analysis of microarray data is described that integrates independent biological knowledge and data with a global F-test for finding genes of interest that minimizes the need for replicates when used for hypothesis generation. First, each microarray is normalized to its noise level around zero. The microarray dataset is then globally adjusted by robust linear regression. Second, genes of interest that capture significant responses to experimental conditions are selected by finding those that express significantly higher variance than those expressing only technical variability. Clustering expression data and identifying expression-independent properties of genes of interest including upstream transcriptional regulatory elements (TREs), ontologies and networks or pathways organizes the data into a biologically meaningful system. We demonstrate that when the number of genes of interest is inconveniently large, identifying a subset of "beacon genes" representing the largest changes will identify pathways or networks altered by biological manipulation. The entire dataset is then used to complete the picture outlined by the "beacon genes." This allow construction of a structured model of a system that can generate biologically testable hypotheses. We illustrate this approach by comparing cells cultured on plastic or an extracellular matrix which organizes a dataset of over 2,000 genes of interest from a genome wide scan of transcription. The resulting model was confirmed by comparing the predicted pattern of TREs with experimental determination of active transcription factors
Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference
A Comprehensive and Universal Method for Assessing the Performance of Differential Gene Expression Analyses
The number of methods for pre-processing and analysis of gene expression data continues to increase, often making it difficult to select the most appropriate approach. We present a simple procedure for comparative estimation of a variety of methods for microarray data pre-processing and analysis. Our approach is based on the use of real microarray data in which controlled fold changes are introduced into 20% of the data to provide a metric for comparison with the unmodified data. The data modifications can be easily applied to raw data measured with any technological platform and retains all the complex structures and statistical characteristics of the real-world data. The power of the method is illustrated by its application to the quantitative comparison of different methods of normalization and analysis of microarray data. Our results demonstrate that the method of controlled modifications of real experimental data provides a simple tool for assessing the performance of data preprocessing and analysis methods
mirCoX: a database of miRNA-mRNA expression correlations derived from RNA-seq meta-analysis
BACKGROUND: Experimentally validated co-expression correlations between miRNAs and genes are a valuable resource to corroborate observations about miRNA/mRNA changes after experimental perturbations, as well as compare miRNA target predictions with empirical observations. For example, when a given miRNA is transcribed, true targets of that miRNA should tend to have lower expression levels relative to when the miRNA is not expressed. METHODS: We processed publicly available human RNA-seq experiments obtained from NCBI's Sequence Read Archive (SRA) to identify miRNA-mRNA co-expression trends and summarized them in terms of their Pearson's Correlation Coefficient (PCC) and significance. RESULTS: We found that sequence-derived parameters from TargetScan and miRanda were predictive of co-expression, and that TargetScan- and miRanda-derived gene-miRNA pairs tend to have anti-correlated expression patterns in RNA-seq data compared to controls. We provide this data for download and as a web application available at http://wrenlab.org/mirCoX/. CONCLUSION: This database of empirically established miRNA-mRNA transcriptional correlations will help to corroborate experimental observations and could be used to help refine and validate miRNA target predictions
Systems biology approach for mapping the response of human urothelial cells to infection by Enterococcus faecalis
<p>Abstract</p> <p>Background</p> <p>To better understand the response of urinary epithelial (urothelial) cells to <it>Enterococcus faecalis</it>, a uropathogen that exhibits resistance to multiple antibiotics, a genome-wide scan of gene expression was obtained as a time series from urothelial cells growing as a layered 3-dimensional culture similar to normal urothelium. We herein describe a novel means of analysis that is based on deconvolution of gene variability into technical and biological components.</p> <p>Results</p> <p>Analysis of the expression of 21,521 genes from 30 minutes to 10 hours post infection, showed 9553 genes were expressed 3 standard deviations (SD) above the system zero-point noise in at least 1 time point. The asymmetric distribution of relative variances of the expressed genes was deconvoluted into technical variation (with a 6.5% relative SD) and biological variation components (>3 SD above the mode technical variability). These 1409 hypervariable (HV) genes encapsulated the effect of infection on gene expression. Pathway analysis of the HV genes revealed an orchestrated response to infection in which early events included initiation of immune response, cytoskeletal rearrangement and cell signaling followed at the end by apoptosis and shutting down cell metabolism. The number of poorly annotated genes in the earliest time points suggests heretofore unknown processes likely also are involved.</p> <p>Conclusion</p> <p><it>Enterococcus </it>infection produced an orchestrated response by the host cells involving several pathways and transcription factors that potentially drive these pathways. The early time points potentially identify novel targets for enhancing the host response. These approaches combine rigorous statistical principles with a biological context and are readily applied by biologists.</p
- …