3,351 research outputs found

    Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets

    Get PDF
    Gene annotation databases (compendiums maintained by the scientific community that describe the biological functions performed by individual genes) are commonly used to evaluate the functional properties of experimentally derived gene sets. Overlap statistics, such as Fisher's Exact Test (FET), are often employed to assess these associations, but don't account for non-uniformity in the number of genes annotated to individual functions or the number of functions associated with individual genes. We find FET is strongly biased toward over-estimating overlap significance if a gene set has an unusually high number of annotations. To correct for these biases, we develop Annotation Enrichment Analysis (AEA), which properly accounts for the non-uniformity of annotations. We show that AEA is able to identify biologically meaningful functional enrichments that are obscured by numerous false-positive enrichment scores in FET, and we therefore suggest it be used to more accurately assess the biological properties of gene sets

    Progress and challenges in the computational prediction of gene function using networks: 2012-2013 update

    Get PDF
    In an opinion published in 2012, we reviewed and discussed our studies of how gene network-based guilt-by-association (GBA) is impacted by confounds related to gene multifunctionality. We found such confounds account for a significant part of the GBA signal, and as a result meaningfully evaluating and applying computationally-guided GBA is more challenging than generally appreciated. We proposed that effort currently spent on incrementally improving algorithms would be better spent in identifying the features of data that do yield novel functional insights. We also suggested that part of the problem is the reliance by computational biologists on gold standard annotations such as the Gene Ontology. In the year since, there has been continued heavy activity in GBA-based research, including work that contributes to our understanding of the issues we raised. Here we provide a review of some of the most relevant recent work, or which point to new areas of progress and challenges

    The Impact of Multifunctional Genes on "Guilt by Association" Analysis

    Get PDF
    Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies

    The Spermatophore in Glossina morsitans morsitans: Insights into Male Contributions to Reproduction.

    Get PDF
    Male Seminal Fluid Proteins (SFPs) transferred during copulation modulate female reproductive physiology and behavior, impacting sperm storage/use, ovulation, oviposition, and remating receptivity. These capabilities make them ideal targets for developing novel methods of insect disease vector control. Little is known about the nature of SFPs in the viviparous tsetse flies (Diptera: Glossinidae), vectors of Human and Animal African trypanosomiasis. In tsetse, male ejaculate is assembled into a capsule-like spermatophore structure visible post-copulation in the female uterus. We applied high-throughput approaches to uncover the composition of the spermatophore in Glossina morsitans morsitans. We found that both male accessory glands and testes contribute to its formation. The male accessory glands produce a small number of abundant novel proteins with yet unknown functions, in addition to enzyme inhibitors and peptidase regulators. The testes contribute sperm in addition to a diverse array of less abundant proteins associated with binding, oxidoreductase/transferase activities, cytoskeletal and lipid/carbohydrate transporter functions. Proteins encoded by female-biased genes are also found in the spermatophore. About half of the proteins display sequence conservation relative to other Diptera, and low similarity to SFPs from other studied species, possibly reflecting both their fast evolutionary pace and the divergent nature of tsetse's viviparous biology

    Gene expression profiling of connective tissue growth factor (CTGF) stimulated primary human tenon fibroblasts reveals an inflammatory and wound healing response in vitro

    Get PDF
    Purpose: The biologic relevance of human connective tissue growth factor (hCTGF) for primary human tenon fibroblasts (HTFs) was investigated by RNA expression profiling using affymetrix (TM) oligonucleotide array technology to identify genes that are regulated by hCTGF. Methods: Recombinant hCTGF was expressed in HEK293T cells and purified by affinity and gel chromatography. Specificity and biologic activity of hCTGF was confirmed by biosensor interaction analysis and proliferation assays. For RNA expression profiling HTFs were stimulated with hCTGF for 48h and analyzed using affymetrix (TM) oligonucleotide array technology. Results were validated by real time RT-PCR. Results: hCTGF induces various groups of genes responsible for a wound healing and inflammatory response in HTFs. A new subset of CTGF inducible inflammatory genes was discovered (e.g., chemokine [C-X-C motif] ligand 1 [CXCL1], chemokine [C-X-C motif] ligand 6 [CXCL6], interleukin 6 [IL6], and interleukin 8 [IL8]). We also identified genes that can transmit the known biologic functions initiated by CTGF such as proliferation and extracellular matrix remodelling. Of special interest is a group of genes, e.g., osteoglycin (OGN) and osteomodulin (OMD), which are known to play a key role in osteoblast biology. Conclusions: This study specifies the important role of hCTGF for primary tenon fibroblast function. The RNA expression profile yields new insights into the relevance of hCTGF in influencing biologic processes like wound healing, inflammation, proliferation, and extracellular matrix remodelling in vitro via transcriptional regulation of specific genes. The results suggest that CTGF potentially acts as a modulating factor in inflammatory and wound healing response in fibroblasts of the human eye

    Application of transcriptomics for predicting protein interaction networks, drug targets and drug candidates

    Get PDF
    Protein interaction pathways and networks are critically-required for a vast range of biological processes. Improved discovery of candidate druggable proteins within specific cell, tissue and disease contexts will aid development of new treatments. Predicting protein interaction networks from gene expression data can provide valuable insights into normal and disease biology. For example, the resulting protein networks can be used to identify potentially druggable targets and drug candidates for testing in cell and animal disease models. The advent of whole-transcriptome expression profiling techniques—that catalogue protein-coding genes expressed within cells and tissues—has enabled development of individual algorithms for particular tasks. For example,: (i) gene ontology algorithms that predict gene/protein subsets involved in related cell processes; (ii) algorithms that predict intracellular protein interaction pathways; and (iii) algorithms that correlate druggable protein targets with known drugs and/or drug candidates. This review examines approaches, advantages and disadvantages of existing gene expression, gene ontology, and protein network prediction algorithms. Using this framework, we examine current efforts to combine these algorithms into pipelines to enable identification of druggable targets, and associated known drugs, using gene expression datasets. In doing so, new opportunities are identified for development of powerful algorithm pipelines, suitable for wide use by non-bioinformaticians, that can predict protein interaction networks, druggable proteins, and related drugs from user gene expression datase
    corecore