Search CORE

7,327 research outputs found

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

Author: Chang Darby Tien-Hao
Chou Lih-Ching
Yu Chi-Yuan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. Results This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. Conclusions Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information.</p

Crossref

Directory of Open Access Journals

PubMed Central

Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps

Author: A Ben-Hur
C Stark
CY Yu
D Betel
E Andres Leon
E Jain
EA Winzeler
H Jeong
H Yu
J Shen
M Jessulat
M Jessulat
N Zaki
R Nussinov
S Martin
S Pitre
S Pitre
T Yoko-o
TSK Prasad
V Neduva
Y Guo
Y Guo
Y Park
Publication venue: Nature Publishing Group
Publication date: 19/04/2012
Field of study

A goal of the post-genomics era has been to elucidate a detailed global map of protein-protein interactions (PPIs) within a cell. Here, we show that the presence of co-occurring short polypeptide sequences between interacting protein partners appears to be conserved across different organisms. We present an algorithm to automatically generate PPI prediction method parameters for various organisms and illustrate that global PPIs can be predicted from previously reported PPIs within the same or a different organism using protein primary sequences. The PPI prediction code is further accelerated through the use of parallel multi-core programming, which improves its usability for large scale or proteome-wide PPI prediction. We predict and analyze hundreds of novel human PPIs, experimentally confirm protein functions and importantly predict the first genome-wide PPI maps for S. pombe (∼9,000 PPIs) and C. elegans (∼37,500 PPIs)

Crossref

Carleton University's Institutional Repository

PubMed Central

Recommended from our members

Integrating protein-protein interaction networks with phenotypes reveals signs of interactions

Author: Hu Yanhui
Mohr Stephanie E.
Neumüller Ralph A.
Perrimon Norbert
Roesel Charles
Samsonova Anastasia A.
Vinayagam Arunachalam
Yilmazel Bahar
Zirin Jonathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/08/2014
Field of study

A major objective of systems biology is to organize molecular interactions as networks and to characterize information-flow within networks. We describe a computational framework to integrate protein-protein interaction (PPI) networks and genetic screens to predict the “signs” of interactions (i.e. activation/inhibition relationships). We constructed a Drosophila melanogaster signed PPI network, consisting of 6,125 signed PPIs connecting 3,352 proteins that can be used to identify positive and negative regulators of signaling pathways and protein complexes. We identified an unexpected role for the metabolic enzymes Enolase and Aldo-keto reductase as positive and negative regulators of proteolysis, respectively. Characterization of the activation/inhibition relationships between physically interacting proteins within signaling pathways will impact our understanding of many biological functions, including signal transduction and mechanisms of disease

Harvard University - DASH

Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem

Author: Giuliani Alessandro
Livi Lorenzo
Rizzi Antonello
Publication venue: 'Elsevier BV'
Publication date: 29/04/2015
Field of study

This paper builds upon the fundamental work of Niwa et al. [34], which provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free standardized microenvironment. The hardness of the problem comes from the superposition between the driving forces of intra- and inter-molecule interactions and it is mirrored by the evidences of shift from folding to aggregation phenotypes by single-point mutations [10]. Here we apply several state-of-the-art classification methods coming from the field of structural pattern recognition, with the aim to compare different representations of the same proteins gathered from the Niwa et al. data base; such representations include sequences and labeled (contact) graphs enriched with chemico-physical attributes. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating "easily foldable" from "hardly foldable" molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution used in the various discrimination systems.Comment: 17 pages, 3 figures, 46 reference

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Computational Approaches to Predict Protein Interaction

Author: Tien-Hao Chang
Publication venue: 'IntechOpen'
Publication date: 30/03/2012
Field of study

IntechOpen

iPDA: integrated protein disorder analyzer

Author: Chen Chien-Yu
Hsu Chen-Ming
Su Chung-Tsai
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

This article presents a web server iPDA, which aims at identifying the disordered regions of a query protein. Automatic prediction of disordered regions from protein sequences is an important problem in the study of structural biology. The proposed classifier DisPSSMP2 is different from several existing disorder predictors by its employment of position-specific scoring matrices with respect to physicochemical properties (PSSMP), where the physicochemical properties adopted here especially take the disorder propensity of amino acids into account. The web server iPDA integrates DisPSSMP2 with several other sequence predictors in order to investigate the functional role of the detected disordered region. The predicted information includes sequence conservation, secondary structure, sequence complexity and hydrophobic clusters. According to the proportion of the secondary structure elements predicted, iPDA dynamically adjusts the cutting threshold of determining protein disorder. Furthermore, a pattern mining package for detecting sequence conservation is embedded in iPDA for discovering potential binding regions of the query protein, which is really helpful to uncovering the relationship between protein function and its primary sequence. The web service is available at http://biominer.bime.ntu.edu.tw/ipda and mirrored at http://biominer.cse.yzu.edu.tw/ipda

CiteSeerX

PubMed Central

National Taiwan University Repository