50,432 research outputs found

    Enhancing the functional content of protein interaction networks

    Full text link
    Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, they face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we explore the use of the concept of common neighborhood similarity (CNS), which is a form of local structure in networks, to address these issues. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of S. cerevisiae interactions, and a set of 136 GO terms, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the HC.contHC.cont measure proposed here performs particularly well for this task. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures, especially HC.contHC.cont, to prune out noisy edges and introduce new links between functionally related proteins

    Predicting protein functions with message passing algorithms

    Full text link
    Motivation: In the last few years a growing interest in biology has been shifting towards the problem of optimal information extraction from the huge amount of data generated via large scale and high-throughput techniques. One of the most relevant issues has recently become that of correctly and reliably predicting the functions of observed but still functionally undetermined proteins starting from information coming from the network of co-observed proteins of known functions. Method: The method proposed in this article is based on a message passing algorithm known as Belief Propagation, which takes as input the network of proteins physical interactions and a catalog of known proteins functions, and returns the probabilities for each unclassified protein of having one chosen function. The implementation of the algorithm allows for fast on-line analysis, and can be easily generalized to more complex graph topologies taking into account hyper-graphs, {\em i.e.} complexes of more than two interacting proteins.Comment: 12 pages, 9 eps figures, 1 additional html tabl

    Global protein function prediction in protein-protein interaction networks

    Full text link
    The determination of protein functions is one of the most challenging problems of the post-genomic era. The sequencing of entire genomes and the possibility to access gene's co-expression patterns has moved the attention from the study of single proteins or small complexes to that of the entire proteome. In this context, the search for reliable methods for proteins' function assignment is of uttermost importance. Previous approaches to deduce the unknown function of a class of proteins have exploited sequence similarities or clustering of co-regulated genes, phylogenetic profiles, protein-protein interactions, and protein complexes. We propose to assign functional classes to proteins from their network of physical interactions, by minimizing the number of interacting proteins with different categories. The function assignment is made on a global scale and depends on the entire connectivity pattern of the protein network. Multiple functional assignments are made possible as a consequence of the existence of multiple equivalent solutions. The method is applied to the yeast Saccharomices Cerevisiae protein-protein interaction network. Robustness is tested in presence of a high percentage of unclassified proteins and under deletion/insertion of interactions.Comment: 5 pages, 2 figures, 2 supplementary table

    Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

    Get PDF
    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

    Sampling of conformational ensemble for virtual screening using molecular dynamics simulations and normal mode analysis

    Get PDF
    Aim: Molecular dynamics simulations and normal mode analysis are well-established approaches to generate receptor conformational ensembles (RCEs) for ligand docking and virtual screening. Here, we report new fast molecular dynamics-based and normal mode analysis-based protocols combined with conformational pocket classifications to efficiently generate RCEs. Materials \& methods: We assessed our protocols on two well-characterized protein targets showing local active site flexibility, dihydrofolate reductase and large collective movements, CDK2. The performance of the RCEs was validated by distinguishing known ligands of dihydrofolate reductase and CDK2 among a dataset of diverse chemical decoys. Results \& discussion: Our results show that different simulation protocols can be efficient for generation of RCEs depending on different kind of protein flexibility

    ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network

    Full text link
    With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.Comment: 13 pages, 5 figure

    Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

    Get PDF
    One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins
    • …
    corecore