2,465 research outputs found

    Assessing the effect of dynamics on the closed-loop protein-folding hypothesis

    Get PDF
    The closed-loop (loop-n-lock) hypothesis of protein folding suggests that loops of about 25 residues, closed through interactions between the loop ends (locks), play an important role in protein structure. Coarse-grain elastic network simulations, and examination of loop lengths in a diverse set of proteins, each supports a bias towards loops of close to 25 residues in length between residues of high stability. Previous studies have established a correlation between total contact distance (TCD), a metric of sequence distances between contacting residues (cf. contact order), and the log-folding rate of a protein. In a set of 43 proteins, we identify an improved correlation ( r 2 = 0.76), when the metric is restricted to residues contacting the locks, compared to the equivalent result when all residues are considered ( r 2 = 0.65). This provides qualified support for the hypothesis, albeit with an increased emphasis upon the importance of a much larger set of residues surrounding the locks. Evidence of a similar-sized protein core/extended nucleus (with significant overlap) was obtained from TCD calculations in which residues were successively eliminated according to their hydrophobicity and connectivity, and from molecular dynamics simulations. Our results suggest that while folding is determined by a subset of residues that can be predicted by application of the closed-loop hypothesis, the original hypothesis is too simplistic; efficient protein folding is dependent on a considerably larger subset of residues than those involved in lock formation. </jats:p

    ProteinTools : a toolkit to analyze protein structures

    Get PDF

    Sequence Determinants of the Folding Free-Energy Landscape of beta alpha-Repeat Proteins: A Dissertation

    Get PDF
    The most common structural platform in biology, the βα-repeat classes of proteins, are represented by the (βα)8TIM barrel topology and the α/β/α sandwich, CheY-like topology. Previous studies on the folding mechanisms of several members of these proteins have suggested that the initial event during refolding involves the formation of a kinetically trapped species that at least partially unfolds before the native conformation can be accessed. The simple topologies of these proteins are thought to permit access to locally folded regions that may coalesce in non-native ways to form stable interactions leading to misfolded intermediates. In a pair of TIM barrel proteins, αTS and sIGPS, it has been shown that the core of the off-pathway folding intermediates is comprised of locally connected clusters of isoleucine, leucine and valine (ILV) residues. These clusters of Branched Aliphatic Side Chains (BASiC) have the unique ability to very effectively prevent the penetration of water to the underlying hydrogen bond networks. This property retards hydrogen exchange with solvent, strengthening main chain hydrogen bonds and linking tertiary and secondary structure in a cooperative network of interactions. This property would also promote the rapid formation of collapsed species during refolding. From this viewpoint, the locally connected topology and the appropriate distribution of ILV residues in the sequence can modulate the energy landscapes of TIM barrel proteins. Another sequence determinant of protein stability that can significantly alter the structure and stability of TIM barrels is the long-range main chain-side chain hydrogen bond. Three of these interactions have been shown to form the molecular underpinnings for the cooperative access to the native state in αTS. Global analysis results presented in Chapter II and Chapter III, suggest that the off-pathway mechanism is common to three proteins of the CheY-like topology, namely CheY, NT-NtrC and Spo0F. These results are corroborated by Gō-simulations that are able to identify the minimal structure of kinetically trapped species during the refolding of CheY and Spo0F. The extent of transient, premature structure appears to correlate with the number of ILV side chains involved in a large sequence-local cluster that is formed between the central β-sheet and helices α2, α3 and α4. The failure of Gō-simulations to detect off-pathway species during the refolding of NT-NtrC may reflect the smaller number of ILV side chains in its corresponding hydrophobic cluster. In Chapter IV, comparison of the location of large ILV clusters with the hydrogen exchange protected regions in 19 proteins, suggest that clusters of BASiC residues are the primarily determinants of the stability cores of globular proteins. Although the location of the ILV clusters is sufficient to determine a majority of the protected amides in a protein structure, the extent of protection is over predicted by the ILV cluster method. The survey of 71 TIM barrel proteins presented in Chapter V, suggests that a specific type of long-range main chain-side chain hydrogen bond, termed “βα hairpin clamp” is a common feature in the βα-repeat proteins. The location and sequence patterns observed demonstrate an evolutionary signature of the βαβ modules that are the building blocks of several βα-repeat protein families. In summary, the work presented in this thesis recognizes the role of sequence in modulating the folding free energy landscapes of proteins. The formation of off-pathway folding intermediates in three CheY-like proteins and the differences in the proposed extent of structure formed in off-pathway intermediates of these three proteins, suggest that both topology and sequence play important and concerted roles in the folding of proteins. Locally connected ILV can clusters lead to off-pathway traps, whereas the formation of the productive folding path requires the development of long-range nativelike topological features to form the native state. The ability of ILV clusters to link secondary and tertiary structure formation enables them to be at the core of this cooperative folding process. Very good correlations between the locations of ILV clusters and both strong protection against exchange and the positions of folding nuclei for a variety of proteins reported in the literature support the generality of the BASiC hypothesis. Finally, the discovery of a novel pattern of H-bond interactions in the TIM barrel architecture, between the amide hydrogen of a core ILV residue with a polar side chain, bracketing βαβ modules, suggests a means for establishing cooperativity between different types of side chain interactions towards formation of the native structure. See Additional Files for copies of the source code for the global analysis program and the cluster analysis program

    Atomic hydration potentials using a Monte Carlo Reference State (MCRS) for protein solvation modeling

    Get PDF
    BACKGROUND: Accurate description of protein interaction with aqueous solvent is crucial for modeling of protein folding, protein-protein interaction, and drug design. Efforts to build a working description of solvation, both by continuous models and by molecular dynamics, yield controversial results. Specifically constructed knowledge-based potentials appear to be promising for accounting for the solvation at the molecular level, yet have not been used for this purpose. RESULTS: We developed original knowledge-based potentials to study protein hydration at the level of atom contacts. The potentials were obtained using a new Monte Carlo reference state (MCRS), which simulates the expected probability density of atom-atom contacts via exhaustive sampling of structure space with random probes. Using the MCRS allowed us to calculate the expected atom contact densities with high resolution over a broad distance range including very short distances. Knowledge-based potentials for hydration of protein atoms of different types were obtained based on frequencies of their contacts at different distances with protein-bound water molecules, in a non-redundant training data base of 1776 proteins with known 3D structures. Protein hydration sites were predicted in a test set of 12 proteins with experimentally determined water locations. The MCRS greatly improves prediction of water locations over existing methods. In addition, the contribution of the energy of macromolecular solvation into total folding free energy was estimated, and tested in fold recognition experiments. The correct folds were preferred over all the misfolded decoys for the majority of proteins from the improved Rosetta decoy set based on the structure hydration energy alone. CONCLUSION: MCRS atomic hydration potentials provide a detailed distance-dependent description of hydropathies of individual protein atoms. This allows placement of water molecules on the surface of proteins and in protein interfaces with much higher precision. The potentials provide a means to estimate the total solvation energy for a protein structure, in many cases achieving a successful fold recognition. Possible applications of atomic hydration potentials to structure verification, protein folding and stability, and protein-protein interactions are discussed

    Development of computational approaches for structural classification, analysis and prediction of molecular recognition regions in proteins

    Get PDF
    The vast and growing volume of 3D protein structural data stored in the PDB contains abundant information about macromolecular complexes, and hence, data about protein interfaces. Non-covalent contacts between amino acids are the basis of protein interactions, and they are responsible for binding afinity and specificity in biological processes. In addition, water networks in protein interfaces can also complement direct interactions contributing significantly to molecular recognition, although their exact role is still not well understood. It is estimated that protein complexes in the PDB are substantially underrepresented due to their crystallization dificulties. Methods for automatic classifification and description of the protein complexes are essential to study protein interfaces, and to propose putative binding regions. Due to this strong need, several protein-protein interaction databases have been developed. However, most of them do not take into account either protein-peptide complexes, solvent information or a proper classification of the binding regions, which are fundamental components to provide an accurate description of protein interfaces. In the firest stage of my thesis, I developed the SCOWLP platform, a database and web application that structurally classifies protein binding regions at family level and defines accurately protein interfaces at atomic detail. The analysis of the results showed that protein-peptide complexes are substantially represented in the PDB, and are the only source of interacting information for several families. By clustering the family binding regions, I could identify 9,334 binding regions and 79,803 protein interfaces in the PDB. Interestingly, I observed that 65% of protein families interact to other molecules through more than one region and in 22% of the cases the same region recognizes different protein families. The database and web application are open to the research community (www.scowlp.org) and can tremendously facilitate high-throughput comparative analysis of protein binding regions, as well as, individual analysis of protein interfaces. SCOWLP and the other databases collect and classify the protein binding regions at family level, where sequence and structure homology exist. Interestingly, it has been observed that many protein families also present structural resemblances within each other, mostly across folds. Likewise, structurally similar interacting motifs (binding regions) have been identified among proteins with different folds and functions. For these reasons, I decided to explore the possibility to infer protein binding regions independently of their fold classification. Thus, I performed the firest systematic analysis of binding region conservation within all protein families that are structurally similar, calculated using non-sequential structural alignment methods. My results indicate there is a substantial molecular recognition information that could be potentially inferred among proteins beyond family level. I obtained a 6 to 8 fold enrichment of binding regions, and identified putative binding regions for 728 protein families that lack binding information. Within the results, I found out protein complexes from different folds that present similar interfaces, confirming the predictive usage of the methodology. The data obtained with my approach may complement the SCOWLP family binding regions suggesting alternative binding regions, and can be used to assist protein-protein docking experiments and facilitate rational ligand design. In the last part of my thesis, I used the interacting information contained in the SCOWLP database to help understand the role that water plays in protein interactions in terms of affinity and specificity. I carried out one of the firest high-throughput analysis of solvent in protein interfaces for a curated dataset of transient and obligate protein complexes. Surprisingly, the results highlight the abundance of water-bridged residues in protein interfaces (40.1% of the interfacial residues) that reinforces the importance of including solvent in protein interaction studies (14.5% extra residues interacting only water- mediated). Interestingly, I also observed that obligate and transient interfaces present a comparable amount of solvent, which contrasts the old thoughts saying that obligate protein complexes are expected to exhibit similarities to protein cores having a dry and hydrophobic interfaces. I characterized novel features of water-bridged residues in terms of secondary structure, temperature factors, residue composition, and pairing preferences that differed from direct residue-residue interactions. The results also showed relevant aspects in the mobility and energetics of water-bridged interfacial residues. Collectively, my doctoral thesis work can be summarized in the following points: 1. I developed SCOWLP, an improved framework that identiffies protein interfaces and classifies protein binding regions at family level. 2. I developed a novel methodology to predict alternative binding regions among structurally similar protein families independently of the fold they belong to. 3. I performed a high-throughput analysis of water-bridged interactions contained in SCOWLP to study the role of solvent in protein interfaces. These three components of my thesis represent novel methods for exploiting existing structural information to gain insights into protein- protein interactions, key mechanisms to understand biological processes

    Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins

    Get PDF
    Amino acid residues, which play important roles in protein function, are often conserved. Here, we analyze thermodynamic and structural data of protein–DNA interactions to explore a relationship between free energy, sequence conservation and structural cooperativity. We observe that the most stabilizing residues or putative hotspots are those which occur as clusters of conserved residues. The higher packing density of the clusters and available experimental thermodynamic data of mutations suggest cooperativity between conserved residues in the clusters. Conserved singlets contribute to the stability of protein–DNA complexes to a lesser extent. We also analyze structural features of conserved residues and their clusters and examine their role in identifying DNA-binding sites. We show that about half of the observed conserved residue clusters are in the interface with the DNA, which could be identified from their amino acid composition; whereas the remaining clusters are at the protein–protein or protein–ligand interface, or embedded in the structural scaffolds. In protein–protein interfaces, conserved residues are highly correlated with experimental residue hotspots, contributing dominantly and often cooperatively to the stability of protein–protein complexes. Overall, the conservation patterns of the stabilizing residues in DNA-binding proteins also highlight the significance of clustering as compared to single residue conservation

    Prediction of functionally important residues in globular proteins from unusual central distances of amino acids

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues.</p> <p>Results</p> <p>Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at <url>http://www.bioinformatics.org/surpresi</url>.</p> <p>Conclusions</p> <p>Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.</p

    Towards Accurate Modeling of Noncovalent Interactions for Protein Rigidity Analysis

    Get PDF
    Background: Protein rigidity analysis is an efficient computational method for extracting flexibility information from static, X-ray crystallography protein data. Atoms and bonds are modeled as a mechanical structure and analyzed with a fast graph-based algorithm, producing a decomposition of the flexible molecule into interconnected rigid clusters. The result depends critically on noncovalent atomic interactions, primarily on how hydrogen bonds and hydrophobic interactions are computed and modeled. Ongoing research points to the stringent need for benchmarking rigidity analysis software systems, towards the goal of increasing their accuracy and validating their results, either against each other and against biologically relevant (functional) parameters. We propose two new methods for modeling hydrogen bonds and hydrophobic interactions that more accurately reflect a mechanical model, without being computationally more intensive. We evaluate them using a novel scoring method, based on the B-cubed score from the information retrieval literature, which measures how well two cluster decompositions match. Results: To evaluate the modeling accuracy of KINARI, our pebble-game rigidity analysis system, we use a benchmark data set of 20 proteins, each with multiple distinct conformations deposited in the Protein Data Bank. Cluster decompositions for them were previously determined with the RigidFinder method from Gerstein\u27s lab and validated against experimental data. When KINARI\u27s default tuning parameters are used, an improvement of the Bcubed score over a crude baseline is observed in 30% of this data. With our new modeling options, improvements were observed in over 70% of the proteins in this data set. We investigate the sensitivity of the cluster decomposition score with case studies on pyruvate phosphate dikinase and calmodulin. Conclusion: To substantially improve the accuracy of protein rigidity analysis systems, thorough benchmarking must be performed on all current systems and future extensions. We have measured the gain in performance by comparing different modeling methods for noncovalent interactions. We showed that new criteria for modeling hydrogen bonds and hydrophobic interactions can significantly improve the results. The two new methods proposed here have been implemented and made publicly available in the current version of KINARI (v1.3), together with the benchmarking tools, which can be downloaded from our software\u27s website, http://kinari.cs.umass.edu
    corecore