14 research outputs found

    Comment on ‘protein–protein binding affinity prediction from amino acid sequence

    Get PDF
    Predicting the strength of interactions between globular proteins is a central and important topic in structural bioinformatics (Moal et al., 2013). The amino acid sequence represents the chemical bonding in a protein which, along with the solvent, dictates how it folds into an ensemble of thermally accessible states. In turn, structure specifies the strength and identity of its binding partners, by establishing the specific arrangements of intermolecular interactions and the intramolecular strain required to achieve them.IHM received funding from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007-2013) under REA grant agreement PIEF-GA-2012-327899.Peer ReviewedPostprint (published version

    Protein binding affinity prediction using support vector regression and interfecial features

    Get PDF
    In understanding biology at the molecular level, analysis of protein interactions and protein binding affinity is a challenge. It is an important problem in computational and structural biology. Experimental measurement of binding affinity in the wet-lab is expensive and time consuming. Therefore, machine learning approaches are widely used to predict protein interactions and binding affinities by learning from specific properties of existing complexes. In this work, we propose an innovative computational model to predict binding affinities and interaction based on sequence, structural and interface features of the interacting proteins that are robust to binding associated conformational changes. We modeled the prediction of binding affinity as classification and regression problem with least-squared and support vector regression models using structure and sequence features of proteins. Specifically, we have used the number and composition of interacting residues at protein complexes interface as features and sequence features. We evaluated the performance of our prediction models using Affinity Benchmark Dataset version 2.0 which contains a diverse set of both bound and unbound protein complex structures with known binding affinities. We evaluated our regression performance results with root mean square error (RMSE) as well as Spearman and Pearson's correlation coefficients using a leave-one-out cross-validation protocol. We evaluate classification results with AUC-ROC and AUC-PR Our results show that Support Vector Regression performs significantly better than other models with a Spearman Correlation coefficient of 0.58, Pearson Correlation score of 0.55 and RMSE of 2.41 using 3-mer and sequence feature. It is interesting to note that simple features based on 3-mer features and the properties of the interface of a protein complex are predictive of its binding affinity. These features, together with support vector regression achieve higher accuracy than existing sequence based methods

    ISLAND: in-silico proteins binding affinity prediction using sequence information

    Get PDF
    Background: Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning. Method: We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity. Results: We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at https://sites.google.com/view/wajidarshad/software. Conclusion: This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem

    In silico Prediction and Validations of Domains Involved in Gossypium hirsutum SnRK1 Protein Interaction With Cotton Leaf Curl Multan Betasatellite Encoded βC1

    Get PDF
    Cotton leaf curl disease (CLCuD) caused by viruses of genus Begomovirus is a major constraint to cotton (Gossypium hirsutum) production in many cotton-growing regions of the world. Symptoms of the disease are caused by Cotton leaf curl Multan betasatellite (CLCuMB) that encodes a pathogenicity determinant protein, βC1. Here, we report the identification of interacting regions in βC1 protein by using computational approaches including sequence recognition, and binding site and interface prediction methods. We show the domain-level interactions based on the structural analysis of G. hirsutum SnRK1 protein and its domains with CLCuMB-βC1. To verify and validate the in silico predictions, three different experimental approaches, yeast two hybrid, bimolecular fluorescence complementation and pull down assay were used. Our results showed that ubiquitin-associated domain (UBA) and autoinhibitory sequence (AIS) domains of G. hirsutum-encoded SnRK1 are involved in CLCuMB-βC1 interaction. This is the first comprehensive investigation that combined in silico interaction prediction followed by experimental validation of interaction between CLCuMB-βC1 and a host protein. We demonstrated that data from computational biology could provide binding site information between CLCuD-associated viruses/satellites and new hosts that lack known binding site information for protein–protein interaction studies. Implications of these findings are discussed

    High Resolution Crystal Structures Leverage Protein Binding Affinity Predictions

    Get PDF
    Predicting protein binding affinities from structural data has remained elusive, adifficulty owing to the variety of protein binding modes. Using the structure-affinity-benchmark(SAB, 144 cases with bound/unbound crystal structures and experimental affinity measurements),prediction has been undertaken either by fitting a model using a handfull of pre-defined variables,or by training a complex model from a large pool of parameters (typically hundreds). The formerroute unnecessarily restricts the model space, while the latter is prone to overfitting.We design models in a third tier, using twelve variables describing enthalpic and entropic variationsupon binding, and a model selection procedure identifying the best sparse model built from a subsetof these variables. Using these models, we report three main results. First, we present modelsyielding a marked improvement of affinity predictions. For the whole dataset, we present a modelpredicting Kd within one and two orders of magnitude for 48% and 79% of cases, respectively.These statistics jump to 62% and 89% respectively, for the subset of the SAB consisting of highresolution structures. Second, we show that these performances owe to a new parameter encodinginterface morphology and packing properties of interface atoms. Third, we argue that interfaceflexibility and prediction hardness do not correlate, and that for flexible cases, a performancematching that of the whole SAB can be achieved. Overall, our work suggests that the affinityprediction problem could be partly solved using databases of high resolution complexes whoseaffinity is known

    Papel de la proteína ANREP sobre la función de PerA en Escherichia coli enteropatógena

    Get PDF
    "EPEC es uno de los principales agentes etiológicos de diarrea infecciosa en niños menores de 5 años en países subdesarrollados. La regulación de los genes de virulencia en EPEC es un proceso fundamental para el desarrollo de la enfermedad que provoca porque asegura la colonización exitosa en el huésped. PerA es una proteína esencial en la regulación de la virulencia de EPEC, porque activa su propia expresión y la de la fimbria BFP. Al mismo tiempo, PerA tiene un efecto indirecto positivo sobre la expresión de los genes de virulencia codificados en la isla LEE. Recientemente, en bacterias patógenas se identificó la presencia de una nueva familia de reguladores anti-virulencia llamada ANR que contrarrestran la función de los miembros de la familia AraC/XylS. Al respecto, en nuestro grupo de trabajo se ha demostrado que un miembro de esta familia llamado ANREP también ejerce un efecto de represión sobre genes regulados por PerA y sobre genes ubicados en la isla LEE. Sin embargo, el mecanismo que emplea esta proteína para reprimir es aún desconocido. Por tanto, estudiar el papel de ANREP como una proteína anti-virulencia que podría modular la función del activador transcripcional PerA es de gran interés"
    corecore