6,365 research outputs found

    Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

    Get PDF
    Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class

    Kernel-based machine learning protocol for predicting DNA-binding proteins

    Get PDF
    DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines (SVMs). Information used for classification is derived from characteristics that include surface and overall composition, overall charge and positive potential patches on the protein surface. In total 121 DNA-BPs and 238 non-binding proteins are used to build and evaluate the protocol. In self-consistency, accuracy value of 100% has been achieved. For cross-validation (CV) optimization over entire dataset, we report an accuracy of 90%. Using leave 1-pair holdout evaluation, the accuracy of 86.3% has been achieved. When we restrict the dataset to less than 20% sequence identity amongst the proteins, the holdout accuracy is achieved at 85.8%. Furthermore, seven DNA-BPs with unbounded structures are all correctly predicted. The current performances are better than results published previously. The higher accuracy value achieved here originates from two factors: the ability of the SVM to handle features that demonstrate a wide range of discriminatory power and, a different definition of the positive patch. Since our protocol does not lean on sequence or structural homology, it can be used to identify or predict proteins with DNA-binding function(s) regardless of their homology to the known ones

    Prediction of protein-protein interaction types using association rule based classification

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund - Copyright @ 2009 Park et alBackground: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches. Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content. Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/SHP was supported by the Korea Research Foundation Grant funded by the Korean Government(KRF-2005-214-E00050). JAR has been supported by the Programme Alβan, the European Union Programme of High level Scholarships for Latin America, scholarship E04D034854CL. SK was supported by Soongsil University Research Fund

    Local pre-processing for node classification in networks : application in protein-protein interaction

    Get PDF
    Network modelling provides an increasingly popular conceptualisation in a wide range of domains, including the analysis of protein structure. Typical approaches to analysis model parameter values at nodes within the network. The spherical locality around a node provides a microenvironment that can be used to characterise an area of a network rather than a particular point within it. Microenvironments that centre on the nodes in a protein chain can be used to quantify parameters that are related to protein functionality. They also permit particular patterns of such parameters in node-centred microenvironments to be used to locate sites of particular interest. This paper evaluates an approach to index generation that seeks to rapidly construct microenvironment data. The results show that index generation performs best when the radius of microenvironments matches the granularity of the index. Results are presented to show that such microenvironments improve the utility of protein chain parameters in classifying the structural characteristics of nodes using both support vector machines and neural networks

    PiRaNhA: A server for the computational prediction of RNA-binding residues in protein sequences

    Get PDF
    The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA. © The Author(s) 2010. Published by Oxford University Press

    PRETICTIVE BIOINFORMATIC METHODS FOR ANALYZING GENES AND PROTEINS

    Get PDF
    Since large amounts of biological data are generated using various high-throughput technologies, efficient computational methods are important for understanding the biological meanings behind the complex data. Machine learning is particularly appealing for biological knowledge discovery. Tissue-specific gene expression and protein sumoylation play essential roles in the cell and are implicated in many human diseases. Protein destabilization is a common mechanism by which mutations cause human diseases. In this study, machine learning approaches were developed for predicting human tissue-specific genes, protein sumoylation sites and protein stability changes upon single amino acid substitutions. Relevant biological features were selected for input vector encoding, and machine learning algorithms, including Random Forests and Support Vector Machines, were used for classifier construction. The results suggest that the approaches give rise to more accurate predictions than previous studies and can provide valuable information for further experimental studies. Moreover, seeSUMO and MuStab web servers were developed to make the classifiers accessible to the biological research community. Structure-based methods can be used to predict the effects of amino acid substitutions on protein function and stability. The nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) located at the protein binding interface have dramatic effects on protein-protein interactions. To model the effects, the nsSNPs at the interfaces of 264 protein-protein complexes were mapped on the protein structures using homology-based methods. The results suggest that disease-causing nsSNPs tend to destabilize the electrostatic component of the binding energy and nsSNPs at conserved positions have significant effects on binding energy changes. The structure-based approach was developed to quantitatively assess the effects of amino acid substitutions on protein stability and protein-protein interaction. It was shown that the structure-based analysis could help elucidate the mechanisms by which mutations cause human genetic disorders. These new bioinformatic methods can be used to analyze some interesting genes and proteins for human genetic research and improve our understanding of their molecular mechanisms underlying human diseases

    Exploiting structural and topological information to improve prediction of RNA-protein binding sites

    Get PDF
    The breast and ovarian cancer susceptibility gene BRCA1 encodes a multifunctional tumor suppressor protein BRCA1, which is involved in regulating cellular processes such as cell cycle, transcription, DNA repair, DNA damage response and chromatin remodeling. BRCA1 protein, located primarily in cell nuclei, interacts with multiple proteins and various DNA targets. It has been demonstrated that BRCA1 protein binds to damaged DNA and plays a role in the transcriptional regulation of downstream target genes. As a key protein in the repair of DNA double-strand breaks, the BRCA1-DNA binding properties, however, have not been reported in detail

    Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity.</p> <p>Result</p> <p>In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall.</p> <p>Conclusion</p> <p>Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.</p

    Knowledge-based energy functions for computational studies of proteins

    Full text link
    This chapter discusses theoretical framework and methods for developing knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also describe a geometric model for developing both linear and non-linear potential functions by optimization. Applications of knowledge-based potential functions in protein-decoy discrimination, in protein-protein interactions, and in protein design are then described. Several issues of knowledge-based potential functions are finally discussed.Comment: 57 pages, 6 figures. To be published in a book by Springe
    corecore