35 research outputs found

    Accuracy of Protein-Protein Binding Sites in High-Throughput Template-Based Modeling

    Get PDF
    The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Γ…, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Γ…<RMSD<10 Γ…, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes

    Protein Docking by the Interface Structure Similarity: How Much Structure Is Needed?

    Get PDF
    The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Γ… across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Γ…, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Γ… cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures

    Classification and Exploration of 3D Protein Domain Interactions Using Kbdock

    Get PDF
    International audienceComparing and classifying protein domain interactions according to theirthree-dimensional (3D) structures can help to understand protein structure-function and evolutionary relationships. Additionally, structural knowledge ofexisting domain–domain interactions can provide a useful way to findstructural templates with which to model the 3D structures of unsolvedprotein complexes. Here we present a straightforward guide to using theβ€œKbdock” protein domain structure database and its associated web site forexploring and comparing protein domain–domain interactions (DDIs) anddomain–peptide interactions (DPIs) at the Pfam domain family level. We alsobriefly explain how the Kbdock web site works, and we provide some notesand suggestions which should help to avoid some common pitfalls whenworking with 3D protein domain structures

    Hydrophilicity Matching – A Potential Prerequisite for the Formation of Protein-Protein Complexes in the Cell

    Get PDF
    A binding event between two proteins typically consists of a diffusional search of binding partners for one another, followed by a specific recognition of the compatible binding sites resulting in the formation of the complex. However, it is unclear how binding partners find each other in the context of the crowded, constantly fluctuating, and interaction-rich cellular environment. Here we examine the non-specific component of protein-protein interactions, which refers to those physicochemical properties of the binding partners that are independent of the exact details of their binding sites, but which can affect their localization or diffusional search for one another. We show that, for a large set of high-resolution experimental 3D structures of binary, transient protein complexes taken from the DOCKGROUND database, the binding partners display a surprising, statistically significant similarity in terms of their total hydration free energies normalized by a size-dependent variable. We hypothesize that colocalization of binding partners, even within individual cellular compartments such as the cytoplasm, may be influenced by their relative hydrophilicity, potentially in response to local hydrophilic gradients

    Evidence for the adaptation of protein pH-dependence to subcellular pH

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The availability of genome sequences, and inferred protein coding genes, has led to several proteome-wide studies of isoelectric points. Generally, isoelectric points are distributed following variations on a biomodal theme that originates from the predominant acid and base amino acid sidechain pKas. The relative populations of the peaks in such distributions may correlate with environment, either for a whole organism or for subcellular compartments. There is also a tendency for isoelectric points averaged over a subcellular location to not coincide with the local pH, which could be related to solubility. We now calculate the correlation of other pH-dependent properties, calculated from 3D structure, with subcellular pH.</p> <p>Results</p> <p>For proteins with known structure and subcellular annotation, the predicted pH at which a protein is most stable, averaged over a location, gives a significantly better correlation with subcellular pH than does isoelectric point. This observation relates to the cumulative properties of proteins, since maximal stability for individual proteins follows the bimodal isoelectric point distribution. Histidine residue location underlies the correlation, a conclusion that is tested against a background of proteins randomised with respect to this feature, and for which the observed correlation drops substantially.</p> <p>Conclusion</p> <p>There exists a constraint on protein pH-dependence, in relation to the local pH, that is manifested in the pKa distribution of histidine sub-proteomes. This is discussed in terms of protein stability, pH homeostasis, and fluctuations in proton concentration.</p

    Improved residue contact prediction using support vector machines and a large feature set

    Get PDF
    BACKGROUND: Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved. RESULTS: Here we develop a new contact map predictor (SVMcon) that uses support vector machines to predict medium- and long-range contacts. SVMcon integrates profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful features. On the same test data set, SVMcon's accuracy is 4% higher than the latest version of the CMAPpro contact map predictor. SVMcon recently participated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment and was evaluated along with seven other contact map predictors. SVMcon was ranked as one of the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation >= 12 on 13 de novo domains. CONCLUSION: We describe SVMcon, a new contact map predictor that uses SVMs and a large set of informative features. SVMcon yields good performance on medium- to long-range contact predictions and can be modularly incorporated into a structure prediction pipeline

    Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions

    Get PDF
    Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/

    Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces

    Get PDF
    Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors

    Scoring docking conformations using predicted protein interfaces

    Get PDF
    BACKGROUND: Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). RESULTS: First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. CONCLUSION: Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations

    Computing Highly Correlated Positions Using Mutual Information and Graph Theory for G Protein-Coupled Receptors

    Get PDF
    G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families
    corecore