4 research outputs found

    Prediction of enzyme function by combining sequence similarity and protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners.</p> <p>Results</p> <p>The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST.</p> <p>Conclusion</p> <p>Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.</p

    Triangle network motifs predict complexes by complementing high-error interactomes with structural information

    Get PDF
    BackgroundA lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles.ResultsWe find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes.ConclusionGiven high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN

    Protein-protein interactions and metabolic pathways reconstruction of Caenorhabditis elegans

    Get PDF
    Metabolic networks are the collections of all cellular activities taking place in a living cell and all the relationships among biological elements of the cell including genes, proteins, enzymes, metabolites, and reactions. They provide a better understanding of cellular mechanisms and phenotypic characteristics of the studied organism. In order to reconstruct a metabolic network, interactions among genes and their molecular attributes along with their functions must be known. Using this information, proteins are distributed among pathways as sub-networks of a greater metabolic network. Proteins which carry out various steps of a biological process operate in same pathway.The metabolic network of Caenorhabditis elegans was reconstructed based on current genomic information obtained from the KEGG database, and commonly found in SWISS-PROT and WormBase. Assuming proteins operating in a pathway are interacting proteins, currently available protein-protein interaction map of the studied organism was assembled. This map contains all known protein-protein interactions collected from various sources up to the time. Topology of the reconstructed network was briefly studied and the role of key enzymes in the interconnectivity of the network was analysed. The analysis showed that the shortest metabolic paths represent the most probable routes taken by the organism where endogenous sources of nutrient are available to the organism. Nonetheless, there are alternate paths to allow the organism to survive under extraneous variations. Signature content information of proteins was utilized to reveal protein interactions upon a notion that when two proteins share signature(s) in their primary structures, the two proteins are more likely to interact. The signature content of proteins was used to measure the extent of similarity between pairs of proteins based on binary similarity score. Pairs of proteins with a binary similarity score greater than a threshold corresponding to confidence level 95% were predicted as interacting proteins. The reliability of predicted pairs was statistically analyzed. The sensitivity and specificity analysis showed that the proposed approach outperformed maximum likelihood estimation (MLE) approach with a 22% increase in area under curve of receiving operator characteristic (ROC) when they were applied to the same datasets. When proteins containing one and two known signatures were removed from the protein dataset, the area under curve (AUC) increased from 0.549 to 0.584 and 0.655, respectively. Increase in the AUC indicates that proteins with one or two known signatures do not provide sufficient information to predict robust protein-protein interactions. Moreover, it demonstrates that when proteins with more known signatures are used in signature profiling methods the overlap with experimental findings will increase resulting in higher true positive rate and eventually greater AUC. Despite the accuracy of protein-protein interaction methods proposed here and elsewhere, they often predict true positive interactions along with numerous false positive interactions. A global algorithm was also proposed to reduce the number of false positive predicted protein interacting pairs. This algorithm relies on gene ontology (GO) annotations of proteins involved in predicted interactions. A dataset of experimentally confirmed protein pair interactions and their GO annotations was used as a training set to train keywords which were able to recover both their source interactions (training set) and predicted interactions in other datasets (test sets). These keywords along with the cellular component annotation of proteins were employed to set a pair of rules that were to be satisfied by any predicted pair of interacting proteins. When this algorithm was applied to four predicted datasets obtained using phylogenetic profiles, gene expression patterns, chance co-occurrence distribution coefficient, and maximum likelihood estimation for S. cerevisiae and C. elegans, the improvement in true positive fractions of the datasets was observed in a magnitude of 2-fold to 10-fold depending on the computational method used to create the dataset and the available information on the organism of interest. The predicted protein-protein interactions were incorporated into the prior reconstructed metabolic network of C. elegans, resulting in 1024 new interactions among 94 metabolic pathways. In each of 1024 new interactions one unknown protein was interacting with a known partner found in the reconstructed metabolic network. Unknown proteins were characterized based on the involvement of their known partners. Based on the binary similarity scores, the function of an uncharacterized protein in an interacting pair was defined according to its known counterpart whose function was already specified. With the incorporation of new predicted interactions to the metabolic network, an expanded version of that network was resulted with 27% increase in the number of known proteins involved in metabolism. Connectivity of proteins in protein-protein interaction map changed from 42 to 34 due to the increase in the number of characterized proteins in the network
    corecore