132 research outputs found

    Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives

    Get PDF
    BACKGROUND: Predicting residues' contacts using primary amino acid sequence alone is an important task that can guide 3D structure modeling and can verify the quality of the predicted 3D structures. The correlated mutations (CM) method serves as the most promising approach and it has been used to predict amino acids pairs that are distant in the primary sequence but form contacts in the native 3D structure of homologous proteins. RESULTS: Here we report a new implementation of the CM method with an added set of selection rules (filters). The parameters of the algorithm were optimized against fifteen high resolution crystal structures with optimization criterion that maximized the confidentiality of the predictions. The optimization resulted in a true positive ratio (TPR) of 0.08 for the CM without filters and a TPR of 0.14 for the CM with filters. The protocol was further benchmarked against 65 high resolution structures that were not included in the optimization test. The benchmarking resulted in a TPR of 0.07 for the CM without filters and to a TPR of 0.09 for the CM with filters. CONCLUSION: Thus, the inclusion of selection rules resulted to an overall improvement of 30%. In addition, the pair-wise comparison of TPR for each protein without and with filters resulted in an average improvement of 1.7. The methodology was implemented into a web server that is freely available to the public. The purpose of this implementation is to provide the 3D structure predictors with a tool that can help with ranking alternative models by satisfying the largest number of predicted contacts, as well as it can provide a confidence score for contacts in cases where structure is known

    Pairwise maximum entropy models for studying large biological systems: when they can and when they can't work

    Get PDF
    One of the most critical problems we face in the study of biological systems is building accurate statistical descriptions of them. This problem has been particularly challenging because biological systems typically contain large numbers of interacting elements, which precludes the use of standard brute force approaches. Recently, though, several groups have reported that there may be an alternate strategy. The reports show that reliable statistical models can be built without knowledge of all the interactions in a system; instead, pairwise interactions can suffice. These findings, however, are based on the analysis of small subsystems. Here we ask whether the observations will generalize to systems of realistic size, that is, whether pairwise models will provide reliable descriptions of true biological systems. Our results show that, in most cases, they will not. The reason is that there is a crossover in the predictive power of pairwise models: If the size of the subsystem is below the crossover point, then the results have no predictive power for large systems. If the size is above the crossover point, the results do have predictive power. This work thus provides a general framework for determining the extent to which pairwise models can be used to predict the behavior of whole biological systems. Applied to neural data, the size of most systems studied so far is below the crossover point

    Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments

    Get PDF
    Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments

    Change in Allosteric Network Affects Binding Affinities of PDZ Domains: Analysis through Perturbation Response Scanning

    Get PDF
    The allosteric mechanism plays a key role in cellular functions of several PDZ domain proteins (PDZs) and is directly linked to pharmaceutical applications; however, it is a challenge to elaborate the nature and extent of these allosteric interactions. One solution to this problem is to explore the dynamics of PDZs, which may provide insights about how intramolecular communication occurs within a single domain. Here, we develop an advancement of perturbation response scanning (PRS) that couples elastic network models with linear response theory (LRT) to predict key residues in allosteric transitions of the two most studied PDZs (PSD-95 PDZ3 domain and hPTP1E PDZ2 domain). With PRS, we first identify the residues that give the highest mean square fluctuation response upon perturbing the binding sites. Strikingly, we observe that the residues with the highest mean square fluctuation response agree with experimentally determined residues involved in allosteric transitions. Second, we construct the allosteric pathways by linking the residues giving the same directional response upon perturbation of the binding sites. The predicted intramolecular communication pathways reveal that PSD-95 and hPTP1E have different pathways through the dynamic coupling of different residue pairs. Moreover, our analysis provides a molecular understanding of experimentally observed hidden allostery of PSD-95. We show that removing the distal third alpha helix from the binding site alters the allosteric pathway and decreases the binding affinity. Overall, these results indicate that (i) dynamics plays a key role in allosteric regulations of PDZs, (ii) the local changes in the residue interactions can lead to significant changes in the dynamics of allosteric regulations, and (iii) this might be the mechanism that each PDZ uses to tailor their binding specificities regulation

    Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification

    Get PDF
    Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution

    A Combinatorial Approach to Detect Coevolved Amino Acid Networks in Protein Families of Variable Divergence

    Get PDF
    Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence

    Perturbation-Response Scanning Reveals Ligand Entry-Exit Mechanisms of Ferric Binding Protein

    Get PDF
    We study apo and holo forms of the bacterial ferric binding protein (FBP) which exhibits the so-called ferric transport dilemma: it uptakes iron from the host with remarkable affinity, yet releases it with ease in the cytoplasm for subsequent use. The observations fit the β€œconformational selection” model whereby the existence of a weakly populated, higher energy conformation that is stabilized in the presence of the ligand is proposed. We introduce a new tool that we term perturbation-response scanning (PRS) for the analysis of remote control strategies utilized. The approach relies on the systematic use of computational perturbation/response techniques based on linear response theory, by sequentially applying directed forces on single-residues along the chain and recording the resulting relative changes in the residue coordinates. We further obtain closed-form expressions for the magnitude and the directionality of the response. Using PRS, we study the ligand release mechanisms of FBP and support the findings by molecular dynamics simulations. We find that the residue-by-residue displacements between the apo and the holo forms, as determined from the X-ray structures, are faithfully reproduced by perturbations applied on the majority of the residues of the apo form. However, once the stabilizing ligand (Fe) is integrated to the system in holo FBP, perturbing only a few select residues successfully reproduces the experimental displacements. Thus, iron uptake by FBP is a favored process in the fluctuating environment of the protein, whereas iron release is controlled by mechanisms including chelation and allostery. The directional analysis that we implement in the PRS methodology implicates the latter mechanism by leading to a few distant, charged, and exposed loop residues. Upon perturbing these, irrespective of the direction of the operating forces, we find that the cap residues involved in iron release are made to operate coherently, facilitating release of the ion

    Evolutionarily Conserved Linkage between Enzyme Fold, Flexibility, and Catalysis

    Get PDF
    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Γ… away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme–substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme–substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design

    Identifying allosteric fluctuation transitions between different protein conformational states as applied to Cyclin Dependent Kinase 2

    Get PDF
    BACKGROUND: The mechanisms underlying protein function and associated conformational change are dominated by a series of local entropy fluctuations affecting the global structure yet are mediated by only a few key residues. Transitional Dynamic Analysis (TDA) is a new method to detect these changes in local protein flexibility between different conformations arising from, for example, ligand binding. Additionally, Positional Impact Vertex for Entropy Transfer (PIVET) uses TDA to identify important residue contact changes that have a large impact on global fluctuation. We demonstrate the utility of these methods for Cyclin-dependent kinase 2 (CDK2), a system with crystal structures of this protein in multiple functionally relevant conformations and experimental data revealing the importance of local fluctuation changes for protein function. RESULTS: TDA and PIVET successfully identified select residues that are responsible for conformation specific regional fluctuation in the activation cycle of Cyclin Dependent Kinase 2 (CDK2). The detected local changes in protein flexibility have been experimentally confirmed to be essential for the regulation and function of the kinase. The methodologies also highlighted possible errors in previous molecular dynamic simulations that need to be resolved in order to understand this key player in cell cycle regulation. Finally, the use of entropy compensation as a possible allosteric mechanism for protein function is reported for CDK2. CONCLUSION: The methodologies embodied in TDA and PIVET provide a quick approach to identify local fluctuation change important for protein function and residue contacts that contributes to these changes. Further, these approaches can be used to check for possible errors in protein dynamic simulations and have the potential to facilitate a better understanding of the contribution of entropy to protein allostery and function
    • …