10,138 research outputs found
The Impact of Multifunctional Genes on "Guilt by Association" Analysis
Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies
Recommended from our members
Single-Cell Transcriptomes Reveal a Complex Cellular Landscape in the Middle Ear and Differential Capacities for Acute Response to Infection.
Single-cell transcriptomics was used to profile cells of the normal murine middle ear. Clustering analysis of 6770 transcriptomes identified 17 cell clusters corresponding to distinct cell types: five epithelial, three stromal, three lymphocyte, two monocyte, two endothelial, one pericyte and one melanocyte cluster. Within some clusters, cell subtypes were identified. While many corresponded to those cell types known from prior studies, several novel types or subtypes were noted. The results indicate unexpected cellular diversity within the resting middle ear mucosa. The resolution of uncomplicated, acute, otitis media is too rapid for cognate immunity to play a major role. Thus innate immunity is likely responsible for normal recovery from middle ear infection. The need for rapid response to pathogens suggests that innate immune genes may be constitutively expressed by middle ear cells. We therefore assessed expression of innate immune genes across all cell types, to evaluate potential for rapid responses to middle ear infection. Resident monocytes/macrophages expressed the most such genes, including pathogen receptors, cytokines, chemokines and chemokine receptors. Other cell types displayed distinct innate immune gene profiles. Epithelial cells preferentially expressed pathogen receptors, bactericidal peptides and mucins. Stromal and endothelial cells expressed pathogen receptors. Pericytes expressed pro-inflammatory cytokines. Lymphocytes expressed chemokine receptors and antimicrobials. The results suggest that tissue monocytes, including macrophages, are the master regulators of the immediate middle ear response to infection, but that virtually all cell types act in concert to mount a defense against pathogens
Recommended from our members
The biological embedding of early-life socioeconomic status and family adversity in children's genome-wide DNA methylation.
AimTo examine variation in child DNA methylation to assess its potential as a pathway for effects of childhood social adversity on health across the life course.Materials & methodsIn a diverse, prospective community sample of 178 kindergarten children, associations between three types of social experience and DNA methylation within buccal epithelial cells later in childhood were examined.ResultsFamily income, parental education and family psychosocial adversity each associated with increased or decreased DNA methylation (488, 354 and 102 sites, respectively) within a unique set of genomic CpG sites. Gene ontology analyses pointed to genes serving immune and developmental regulation functions.ConclusionFindings provided support for DNA methylation as a biomarker linking early-life social experiences with later life health in humans
Recommended from our members
Transcriptional Response to Acute Thermal Exposure in Juvenile Chinook Salmon Determined by RNAseq.
Thermal exposure is a serious and growing challenge facing fish species worldwide. Chinook salmon (Oncorhynchus tshawytscha) living in the southern portion of their native range are particularly likely to encounter warmer water due to a confluence of factors. River alterations have increased the likelihood that juveniles will be exposed to warm water temperatures during their freshwater life stage, which can negatively impact survival, growth, and development and pose a threat to dwindling salmon populations. To better understand how acute thermal exposure affects the biology of salmon, we performed a transcriptional analysis of gill tissue from Chinook salmon juveniles reared at 12° and exposed acutely to water temperatures ranging from ideal to potentially lethal (12° to 25°). Reverse-transcribed RNA libraries were sequenced on the Illumina HiSeq2000 platform and a de novo reference transcriptome was created. Differentially expressed transcripts were annotated using Blast2GO and relevant gene clusters were identified. In addition to a high degree of downregulation of a wide range of genes, we found upregulation of genes involved in protein folding/rescue, protein degradation, cell death, oxidative stress, metabolism, inflammation/immunity, transcription/translation, ion transport, cell cycle/growth, cell signaling, cellular trafficking, and structure/cytoskeleton. These results demonstrate the complex multi-modal cellular response to thermal stress in juvenile salmon
“Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks
Gene networks are commonly interpreted as encoding functional information in their connections. An extensively validated principle called guilt by association states that genes which are associated or interacting are more likely to share function. Guilt by association provides the central top-down principle for analyzing gene networks in functional terms or assessing their quality in encoding functional information. In this work, we show that functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network. In effect, the apparent encoding of function within networks has been largely driven by outliers whose behaviour cannot even be generalized to individual genes, let alone to the network at large. While experimentalist-driven analysis of interactions may use prior expert knowledge to focus on the small fraction of critically important data, large-scale computational analyses have typically assumed that high-performance cross-validation in a network is due to a generalizable encoding of function. Because we find that gene function is not systemically encoded in networks, but dependent on specific and critical interactions, we conclude it is necessary to focus on the details of how networks encode function and what information computational analyses use to extract functional meaning. We explore a number of consequences of this and find that network structure itself provides clues as to which connections are critical and that systemic properties, such as scale-free-like behaviour, do not map onto the functional connectivity within networks
Recommended from our members
Phenotypic and functional characterization of corneal endothelial cells during in vitro expansion.
The advent of cell culture-based methods for the establishment and expansion of human corneal endothelial cells (CEnC) has provided a source of transplantable corneal endothelium, with a significant potential to challenge the one donor-one recipient paradigm. However, concerns over cell identity remain, and a comprehensive characterization of the cultured CEnC across serial passages has not been performed. To this end, we compared two established CEnC culture methods by assessing the transcriptomic changes that occur during in vitro expansion. In confluent monolayers, low mitogenic culture conditions preserved corneal endothelial cell state identity better than culture in high mitogenic conditions. Expansion by continuous passaging induced replicative cell senescence. Transcriptomic analysis of the senescent phenotype identified a cell senescence signature distinct for CEnC. We identified activation of both classic and new cell signaling pathways that may be targeted to prevent senescence, a significant barrier to realizing the potential clinical utility of in vitro expansion
Bioinformatics and Moonlighting Proteins
Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyse and describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are: a) remote homology searches using Psi-Blast, b) detection of functional motifs and domains, c) analysis of data from protein-protein interaction databases (PPIs), d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) have the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations –it requires the existence of multialigned family protein sequences - but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses
Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data
Background:
Disordered proteins need to be expressed to carry out specified functions; however, their accumulation in the cell can potentially cause major problems through protein misfolding and aggregation. Gene expression levels, mRNA decay rates, microRNA (miRNA) targeting and ubiquitination have critical roles in the degradation and disposal of human proteins and transcripts. Here, we describe a study examining these features to gain insights into the regulation of disordered proteins.
Results:
In comparison with ordered proteins, disordered proteins have a greater proportion of predicted ubiquitination sites. The transcripts encoding disordered proteins also have higher proportions of predicted miRNA target sites and higher mRNA decay rates, both of which are indicative of the observed lower gene expression levels. The results suggest that the disordered proteins and their transcripts are present in the cell at low levels and/or for a short time before being targeted for disposal. Surprisingly, we find that for a significant proportion of highly disordered proteins, all four of these trends are reversed. Predicted estimates for miRNA targets, ubiquitination and mRNA decay rate are low in the highly disordered proteins that are constitutively and/or highly expressed.
Conclusions:
Mechanisms are in place to protect the cell from these potentially dangerous proteins. The evidence suggests that the enrichment of signals for miRNA targeting and ubiquitination may help prevent the accumulation of disordered proteins in the cell. Our data also provide evidence for a mechanism by which a significant proportion of highly disordered proteins (with high expression levels) can escape rapid degradation to allow them to successfully carry out their function
- …