3,351 research outputs found
Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets
Gene annotation databases (compendiums maintained by the scientific community
that describe the biological functions performed by individual genes) are
commonly used to evaluate the functional properties of experimentally derived
gene sets. Overlap statistics, such as Fisher's Exact Test (FET), are often
employed to assess these associations, but don't account for non-uniformity in
the number of genes annotated to individual functions or the number of
functions associated with individual genes. We find FET is strongly biased
toward over-estimating overlap significance if a gene set has an unusually high
number of annotations. To correct for these biases, we develop Annotation
Enrichment Analysis (AEA), which properly accounts for the non-uniformity of
annotations. We show that AEA is able to identify biologically meaningful
functional enrichments that are obscured by numerous false-positive enrichment
scores in FET, and we therefore suggest it be used to more accurately assess
the biological properties of gene sets
Progress and challenges in the computational prediction of gene function using networks: 2012-2013 update
In an opinion published in 2012, we reviewed and discussed our studies of how gene network-based guilt-by-association (GBA) is impacted by confounds related to gene multifunctionality. We found such confounds account for a significant part of the GBA signal, and as a result meaningfully evaluating and applying computationally-guided GBA is more challenging than generally appreciated. We proposed that effort currently spent on incrementally improving algorithms would be better spent in identifying the features of data that do yield novel functional insights. We also suggested that part of the problem is the reliance by computational biologists on gold standard annotations such as the Gene Ontology. In the year since, there has been continued heavy activity in GBA-based research, including work that contributes to our understanding of the issues we raised. Here we provide a review of some of the most relevant recent work, or which point to new areas of progress and challenges
The Impact of Multifunctional Genes on "Guilt by Association" Analysis
Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies
The Spermatophore in Glossina morsitans morsitans: Insights into Male Contributions to Reproduction.
Male Seminal Fluid Proteins (SFPs) transferred during copulation modulate female reproductive physiology and behavior, impacting sperm storage/use, ovulation, oviposition, and remating receptivity. These capabilities make them ideal targets for developing novel methods of insect disease vector control. Little is known about the nature of SFPs in the viviparous tsetse flies (Diptera: Glossinidae), vectors of Human and Animal African trypanosomiasis. In tsetse, male ejaculate is assembled into a capsule-like spermatophore structure visible post-copulation in the female uterus. We applied high-throughput approaches to uncover the composition of the spermatophore in Glossina morsitans morsitans. We found that both male accessory glands and testes contribute to its formation. The male accessory glands produce a small number of abundant novel proteins with yet unknown functions, in addition to enzyme inhibitors and peptidase regulators. The testes contribute sperm in addition to a diverse array of less abundant proteins associated with binding, oxidoreductase/transferase activities, cytoskeletal and lipid/carbohydrate transporter functions. Proteins encoded by female-biased genes are also found in the spermatophore. About half of the proteins display sequence conservation relative to other Diptera, and low similarity to SFPs from other studied species, possibly reflecting both their fast evolutionary pace and the divergent nature of tsetse's viviparous biology
Recommended from our members
HNRNPK maintains epidermal progenitor function through transcription of proliferation genes and degrading differentiation promoting mRNAs.
Maintenance of high-turnover tissues such as the epidermis requires a balance between stem cell proliferation and differentiation. The molecular mechanisms governing this process are an area of investigation. Here we show that HNRNPK, a multifunctional protein, is necessary to prevent premature differentiation and sustains the proliferative capacity of epidermal stem and progenitor cells. To prevent premature differentiation of progenitor cells, HNRNPK is necessary for DDX6 to bind a subset of mRNAs that code for transcription factors that promote differentiation. Upon binding, these mRNAs such as GRHL3, KLF4, and ZNF750 are degraded through the mRNA degradation pathway, which prevents premature differentiation. To sustain the proliferative capacity of the epidermis, HNRNPK is necessary for RNA Polymerase II binding to proliferation/self-renewal genes such as MYC, CYR61, FGFBP1, EGFR, and cyclins to promote their expression. Our study establishes a prominent role for HNRNPK in maintaining adult tissue self-renewal through both transcriptional and post-transcriptional mechanisms
Gene expression profiling of connective tissue growth factor (CTGF) stimulated primary human tenon fibroblasts reveals an inflammatory and wound healing response in vitro
Purpose:
The biologic relevance of human connective tissue growth factor (hCTGF) for primary human tenon fibroblasts (HTFs) was investigated by RNA expression profiling using affymetrix (TM) oligonucleotide array technology to identify genes that are regulated by hCTGF.
Methods:
Recombinant hCTGF was expressed in HEK293T cells and purified by affinity and gel chromatography. Specificity and biologic activity of hCTGF was confirmed by biosensor interaction analysis and proliferation assays. For RNA expression profiling HTFs were stimulated with hCTGF for 48h and analyzed using affymetrix (TM) oligonucleotide array technology. Results were validated by real time RT-PCR.
Results:
hCTGF induces various groups of genes responsible for a wound healing and inflammatory response in HTFs. A new subset of CTGF inducible inflammatory genes was discovered (e.g., chemokine [C-X-C motif] ligand 1 [CXCL1], chemokine [C-X-C motif] ligand 6 [CXCL6], interleukin 6 [IL6], and interleukin 8 [IL8]). We also identified genes that can transmit the known biologic functions initiated by CTGF such as proliferation and extracellular matrix remodelling. Of special interest is a group of genes, e.g., osteoglycin (OGN) and osteomodulin (OMD), which are known to play a key role in osteoblast biology.
Conclusions:
This study specifies the important role of hCTGF for primary tenon fibroblast function. The RNA expression profile yields new insights into the relevance of hCTGF in influencing biologic processes like wound healing, inflammation, proliferation, and extracellular matrix remodelling in vitro via transcriptional regulation of specific genes. The results suggest that CTGF potentially acts as a modulating factor in inflammatory and wound healing response in fibroblasts of the human eye
Application of transcriptomics for predicting protein interaction networks, drug targets and drug candidates
Protein interaction pathways and networks are critically-required for a vast range of biological processes. Improved discovery of candidate druggable proteins within specific cell, tissue and disease contexts will aid development of new treatments. Predicting protein interaction networks from gene expression data can provide valuable insights into normal and disease biology. For example, the resulting protein networks can be used to identify potentially druggable targets and drug candidates for testing in cell and animal disease models. The advent of whole-transcriptome expression profiling techniques—that catalogue protein-coding genes expressed within cells and tissues—has enabled development of individual algorithms for particular tasks. For example,: (i) gene ontology algorithms that predict gene/protein subsets involved in related cell processes; (ii) algorithms that predict intracellular protein interaction pathways; and (iii) algorithms that correlate druggable protein targets with known drugs and/or drug candidates. This review examines approaches, advantages and disadvantages of existing gene expression, gene ontology, and protein network prediction algorithms. Using this framework, we examine current efforts to combine these algorithms into pipelines to enable identification of druggable targets, and associated known drugs, using gene expression datasets. In doing so, new opportunities are identified for development of powerful algorithm pipelines, suitable for wide use by non-bioinformaticians, that can predict protein interaction networks, druggable proteins, and related drugs from user gene expression datase
- …