8,097 research outputs found

    Mixed membership stochastic blockmodels

    Full text link
    Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.Comment: 46 pages, 14 figures, 3 table

    Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

    Get PDF
    One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins

    MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants

    Get PDF
    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest

    A Combined Approach for Genome Wide Protein Function Annotation/Prediction

    Get PDF
    Background Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. Methods We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO). Results We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species protein

    Potential genomic biomarkers of obesity and its comorbidities for phthalates and bisphenol A mixture: In silico toxicogenomic approach

    Get PDF
    This in silico toxicogenomic study aims to explore the relationship between phthalates and bisphenol A (BPA) co-exposure and obesity, as well as its comorbid conditions, in order to construct a possible set of genomic biomarkers. The Comparative Toxicogenomics Database (CTD; http://ctd.mdibl.org) was used as the main data mining tool, along with GeneMania (https://genemania.org), ToppGene Suite (https://toppgene.cchmc.org) and DisGeNET (http://www. disgenet.org). Among the phthalates, bis(2-ethylhexyl) phthalate (DEHP) and dibutyl phthalate (DBP) were chosen as the most frequently curated phthalates in CTD, which also share similar mechanisms of toxicity. DEHP, DBP and BPA interacted with 84, 90 and 194 obesity-related genes/proteins, involved in 67, 65 and 116 pathways, respectively. Among these, 53 genes/proteins and 42 pathways were common to all three substances. 31 genes/proteins had matching interactions for all three investigated substances, while more than half of these genes/proteins (56.49%) were in co-expression. 7 of the common genes/proteins (6 relevant to humans: CCL2, IL6, LPL, PPARG, SERPINE1, and TNF) were identified in all the investigated obesity comorbidities, while PPARG and LPL were most closely linked to obesity. These genes/proteins could serve as a target for further in vitro and in vivo studies of molecular mechanisms of DEHP, DBP and BPA mixture obesogenic properties. Analysis reported here should be applicable to any mixture of environmental chemicals and any disease present in CTD

    Automated data integration for developmental biological research

    Get PDF
    In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research

    Semantic integration to identify overlapping functional modules in protein interaction networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms.</p> <p>Results</p> <p>We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches.</p> <p>Conclusion</p> <p>The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification.</p

    T-ALL and thymocytes : a message of noncoding RNAs

    Get PDF
    In the last decade, the role for noncoding RNAs in disease was clearly established, starting with microRNAs and later expanded towards long noncoding RNAs. This was also the case for T cell acute lymphoblastic leukemia, which is a malignant blood disorder arising from oncogenic events during normal T cell development in the thymus. By studying the transcriptomic profile of protein-coding genes, several oncogenic events leading to T cell acute lymphoblastic leukemia (T-ALL) could be identified. In recent years, it became apparent that several of these oncogenes function via microRNAs and long noncoding RNAs. In this review, we give a detailed overview of the studies that describe the noncoding RNAome in T-ALL oncogenesis and normal T cell development

    Dopamine perturbation of gene co-expression networks reveals differential response in schizophrenia for translational machinery.

    Get PDF
    The dopaminergic hypothesis of schizophrenia (SZ) postulates that positive symptoms of SZ, in particular psychosis, are due to disturbed neurotransmission via the dopamine (DA) receptor D2 (DRD2). However, DA is a reactive molecule that yields various oxidative species, and thus has important non-receptor-mediated effects, with empirical evidence of cellular toxicity and neurodegeneration. Here we examine non-receptor-mediated effects of DA on gene co-expression networks and its potential role in SZ pathology. Transcriptomic profiles were measured by RNA-seq in B-cell transformed lymphoblastoid cell lines from 514 SZ cases and 690 controls, both before and after exposure to DA ex vivo (100 μM). Gene co-expression modules were identified using Weighted Gene Co-expression Network Analysis for both baseline and DA-stimulated conditions, with each module characterized for biological function and tested for association with SZ status and SNPs from a genome-wide panel. We identified seven co-expression modules under baseline, of which six were preserved in DA-stimulated data. One module shows significantly increased association with SZ after DA perturbation (baseline: P = 0.023; DA-stimulated: P = 7.8 × 10-5; ΔAIC = -10.5) and is highly enriched for genes related to ribosomal proteins and translation (FDR = 4 × 10-141), mitochondrial oxidative phosphorylation, and neurodegeneration. SNP association testing revealed tentative QTLs underlying module co-expression, notably at FASTKD2 (top P = 2.8 × 10-6), a gene involved in mitochondrial translation. These results substantiate the role of translational machinery in SZ pathogenesis, providing insights into a possible dopaminergic mechanism disrupting mitochondrial function, and demonstrates the utility of disease-relevant functional perturbation in the study of complex genetic etiologies
    corecore