2,736 research outputs found

    Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

    Get PDF
    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

    In silico identification of essential proteins in Corynebacterium pseudotuberculosis based on protein-protein interaction networks

    Get PDF
    Background Corynebacterium pseudotuberculosis (Cp) is a gram-positive bacterium that is classified into equi and ovis serovars. The serovar ovis is the etiological agent of caseous lymphadenitis, a chronic infection affecting sheep and goats, causing economic losses due to carcass condemnation and decreased production of meat, wool, and milk. Current diagnosis or treatment protocols are not fully effective and, thus, require further research of Cp pathogenesis. Results Here, we mapped known protein-protein interactions (PPI) from various species to nine Cp strains to reconstruct parts of the potential Cp interactome and to identify potentially essential proteins serving as putative drug targets. On average, we predict 16,669 interactions for each of the nine strains (with 15,495 interactions shared among all strains). An in silico sanity check suggests that the potential networks were not formed by spurious interactions but have a strong biological bias. With the inferred Cp networks we identify 181 essential proteins, among which 41 are non-host homologous. Conclusions The list of candidate interactions of the Cp strains lay the basis for developing novel hypotheses and designing according wet-lab studies. The non-host homologous essential proteins are attractive targets for therapeutic and diagnostic proposes. They allow for searching of small molecule inhibitors of binding interactions enabling modern drug discovery. Overall, the predicted Cp PPI networks form a valuable and versatile tool for researchers interested in Corynebacterium pseudotuberculosis

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    Machine learning for the prediction of protein-protein interactions

    Get PDF
    The prediction of protein-protein interactions (PPI) has recently emerged as an important problem in the fields of bioinformatics and systems biology, due to the fact that most essential cellular processes are mediated by these kinds of interactions. In this thesis we focussed in the prediction of co-complex interactions, where the objective is to identify and characterize protein pairs which are members of the same protein complex. Although high-throughput methods for the direct identification of PPI have been developed in the last years. It has been demonstrated that the data obtained by these methods is often incomplete and suffers from high false-positive and false-negative rates. In order to deal with this technology-driven problem, several machine learning techniques have been employed in the past to improve the accuracy and trustability of predicted protein interacting pairs, demonstrating that the combined use of direct and indirect biological insights can improve the quality of predictive PPI models. This task has been commonly viewed as a binary classification problem. However, the nature of the data creates two major problems. Firstly, the imbalanced class problem due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly, the selection of negative examples is based on some unreliable assumptions which could introduce some bias in the classification results. The first part of this dissertation addresses these drawbacks by exploring the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilize examples of just one class to generate a predictive model which is consequently independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We designed and carried out a performance evaluation study of several OCC methods for this task. We also undertook a comparative performance evaluation with several conventional learning techniques. Furthermore, we pay attention to a new potential drawback which appears to affect the performance of PPI prediction. This is associated with the composition of the positive gold standard set, which contain a high proportion of examples associated with interactions of ribosomal proteins. We demonstrate that this situation indeed biases the classification task, resulting in an over-optimistic performance result. The prediction of non-ribosomal PPI is a much more difficult task. We investigate some strategies in order to improve the performance of this subtask, integrating new kinds of data as well as combining diverse classification models generated from different sets of data. In this thesis, we undertook a preliminary validation study of the new PPI predicted by using OCC methods. To achieve this, we focus in three main aspects: look for biological evidence in the literature that support the new predictions; the analysis of predicted PPI networks properties; and the identification of highly interconnected groups of proteins which can be associated with new protein complexes. Finally, this thesis explores a slightly different area, related to the prediction of PPI types. This is associated with the classification of PPI structures (complexes) contained in the Protein Data Bank (PDB) data base according to its function and binding affinity. Considering the relatively reduced number of crystalized protein complexes available, it is not possible at the moment to link these results with the ones obtained previously for the prediction of PPI complexes. However, this could be possible in the near future when more PPI structures will be available

    Triangle network motifs predict complexes by complementing high-error interactomes with structural information

    Get PDF
    BackgroundA lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles.ResultsWe find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes.ConclusionGiven high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN

    Protein complex prediction via verifying and reconstructing the topology of domain-domain interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions.</p> <p>Results</p> <p>Here, we introduce a combinatorial approach for prediction of protein complexes focusing not only on determining member proteins in complexes but also on the DDI/PPI organization of the complexes. Our method analyzes complex candidates predicted by the existing methods. It searches for optimal combinations of domain-domain interactions in the candidates based on an assumption that the proteins in a candidate can form a true protein complex if each of the domains is used by a single protein interaction. This optimization problem was mathematically formulated and solved using binary integer linear programming. By using publicly available sets of yeast protein-protein interactions and domain-domain interactions, we succeeded in extracting protein complex candidates with an accuracy that is twice the average accuracy of the existing methods, MCL, MCODE, or clustering coefficient. Although the configuring parameters for each algorithm resulted in slightly improved precisions, our method always showed better precision for most values of the parameters.</p> <p>Conclusions</p> <p>Our combinatorial approach can provide better accuracy for prediction of protein complexes and also enables to identify both direct PPIs and DDIs that mediate them in complexes.</p

    ProCbA: Protein Function Prediction based on Clique Analysis

    Get PDF
    Protein function prediction based on protein-protein interactions (PPI) is one of the most important challenges of the post-Genomic era. Due to the fact that determining protein function by experimental techniques can be costly, function prediction has become an important challenge for computational biology and bioinformatics. Some researchers utilize graph- (or network-) based methods using PPI networks for unannotated proteins. The aim of this study is to increase the accuracy of the protein function prediction using two proposed methods. To predict protein functions, we propose a Protein Function Prediction based on Clique Analysis (ProCbA) and Protein Function Prediction on Neighborhood Counting using functional aggregation (ProNC-FA). Both ProCbA and ProNC-FA can predict the functions of unknown proteins. In addition, in ProNC-FA which does not include a new algorithm; we attempt to solve the essence of incomplete and noisy data of the PPI era in order to achieve a network with complete functional aggregation. The experimental results on MIPS data and the 17 different explained datasets validate the encouraging performance and the strength of both ProCbA and ProNC-FA on function prediction. Experimental result analysis demonstrates that both ProCbA and ProNC-FA are generally able to outperform all the other methods

    Global Network Alignment

    Get PDF
    Motivation: High-throughput methods for detecting molecular interactions have lead to a plethora of biological network data with much more yet to come, stimulating the development of techniques for biological network alignment. Analogous to sequence alignment, efficient and reliable network alignment methods will improve our understanding of biological systems. Network alignment is computationally hard. Hence, devising efficient network alignment heuristics is currently one of the foremost challenges in computational biology. &#xd;&#xa;&#xd;&#xa;Results: We present a superior heuristic network alignment algorithm, called Matching-based GRAph ALigner (M-GRAAL), which can process and integrate any number and type of similarity measures between network nodes (e.g., proteins), including, but not limited to, any topological network similarity measure, sequence similarity, functional similarity, and structural similarity. This is efficient in resolving ties in similarity measures and in finding a combination of similarity measures yielding the largest biologically sound alignments. When used to align protein-protein interaction (PPI) networks of various species, M-GRAAL exposes the largest known functional and contiguous regions of network similarity. Hence, we use M-GRAAL&#x2019;s alignments to predict functions of un-annotated proteins in yeast, human, and bacteria _C. jejuni_ and _E. coli_. Furthermore, using M-GRAAL to compare PPI networks of different herpes viruses, we reconstruct their phylogenetic relationship and our phylogenetic tree is the same as sequenced-based one

    Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology

    Get PDF
    Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets
    corecore