524 research outputs found

    vGNM: a Better Model for Understanding the Dynamics of Proteins in Crystals

    Get PDF
    The dynamics of proteins are important for understanding their functions. In recent years, the simple coarse-grained Gaussian Network Model (GNM) has been fairly successful in interpreting crystallographic B-factors. However, the model clearly ignores the contribution of the rigid body motions and the effect of crystal packing. The model cannot explain the fact that the same protein may have significantly different B-factors under different crystal packing conditions. In this work, we propose a new Gaussian network model, called vGNM, which takes into account both the contribution of the rigid body motions and the effect of crystal packing, by allowing the amplitude of the internal modes to be variables. It hypothesizes that the effect of crystal packing should cause some modes to be amplified, and others to become less feasible. In doing so, vGNM is able to resolve the apparent discrepancy in experimental B-factors among structures of the same protein but with different crystal packing conditions, which GNM cannot explain. With a small number of parameters, vGNM is able to reproduce experimental B-factors for a large set of proteins with significantly better correlations (having a mean value of 0.81 as compared to 0.59 by GNM). The results of applying vGNM also show that the rigid body motions account for nearly 60% of the total fluctuations, in good agreement with previous findings

    Long- and short-range interactions in native protein structures are consistent/minimally frustrated in sequence space

    Get PDF
    We show that long- and short-range interactions in almost all protein native structures are actually consistent with each other for coarse-grained energy scales; specifically we mean the long-range inter-residue contact energies and the short-range secondary structure energies based on peptide dihedral angles, which are potentials of mean force evaluated from residue distributions observed in protein native structures. This consistency is observed at equilibrium in sequence space rather than in conformational space. Statistical ensembles of sequences are generated by exchanging residues for each of 797 protein native structures with the Metropolis method. It is shown that adding the other category of interaction to either the short- or long-range interactions decreases the means and variances of those energies for essentially all protein native structures, indicating that both interactions consistently work by more-or-less restricting sequence spaces available to one of the interactions. In addition to this consistency, independence by these interaction classes is also indicated by the fact that there are almost no correlations between them when equilibrated using both interactions and significant but small, positive correlations at equilibrium using only one of the interactions. Evidence is provided that protein native sequences can be regarded approximately as samples from the statistical ensembles of sequences with these energy scales and that all proteins have the same effective conformational temperature. Designing protein structures and sequences to be consistent and minimally frustrated among the various interactions is a most effective way to increase protein stability and foldability

    Combining Disparate Data Types: Protein Sequences and Protein Structures

    Get PDF
    With the development of high-throughput, next-generation sequencing and other advanced technologies, a large number of gene expression profiles have been produced. Many of these profiles are available from public databases [1-3]. A challenging research problem that has drawn a lot of attention in the past is to infer gene regulatory networks from the expression data. A gene regulatory network is represented by a directed graph, in which nodes represent transcription factors or mRNA with edges showing transcriptional regulatory relationships between two nodes

    Elastic network models capture the motions apparent within ensembles of RNA structures

    Get PDF
    The role of structure and dynamics in mechanisms for RNA becomes increasingly important. Computational approaches using simple dynamics models have been successful at predicting the motions of proteins and are often applied to ribonucleo-protein complexes but have not been thoroughly tested for well-packed nucleic acid structures. In order to characterize a true set of motions, we investigate the apparent motions from 16 ensembles of experimentally determined RNA structures. These indicate a relatively limited set of motions that are captured by a small set of principal components (PCs). These limited motions closely resemble the motions computed from low frequency normal modes from elastic network models (ENMs), either at atomic or coarse-grained resolution. Various ENM model types, parameters, and structure representations are tested here against the experimental RNA structural ensembles, exposing differences between models for proteins and for folded RNAs. Differences in performance are seen, depending on the structure alignment algorithm used to generate PCs, modulating the apparent utility of ENMs but not significantly impacting their ability to generate functional motions. The loss of dynamical information upon coarse-graining is somewhat larger for RNAs than for globular proteins, indicating, perhaps, the lower cooperativity of the less densely packed RNA. However, the RNA structures show less sensitivity to the elastic network model parameters than do proteins. These findings further demonstrate the utility of ENMs and the appropriateness of their application to well-packed RNA-only structures, justifying their use for studying the dynamics of ribonucleo-proteins, such as the ribosome and regulatory RNAs

    SeqStruct : A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Contacts Yields Sequence-Structure Congruence

    Get PDF
    Protein sequence matching does not properly account for some well-known features of protein structures: surface residues being more variable than core residues, the high packing densities in globular proteins, and does not yield good matches of sequences of many proteins known to be close structural relatives. There are now abundant protein sequences and structures to enable major improvements to sequence matching. Here, we utilize structural frameworks to mount the observed correlated sequences to identify the most important correlated parts. The rationale is that protein structures provide the important physical framework for improving sequence matching. Combining the sequence and structure data in this way leads to a simple amino acid substitution matrix that can be readily incorporated into any sequence matching. This enables the incorporation of allosteric information into sequence matching and transforms it effectively from a 1-D to a 3-D procedure. The results from testing in over 3,000 sequence matches demonstrate a 37% gain in sequence similarity and a loss of 26% of the gaps when compared with the use of BLOSUM62. And, importantly there are major gains in the specificity of sequence matching across diverse proteins. Specifically, all known cases where protein structures match but sequences do not match well are resolved

    Protein–DNA Hydrophobic Recognition in the Minor Groove is Facilitated by Sugar Switching

    Get PDF
    Information readout in the DNA minor groove is accompanied by substantial DNA deformations, such as sugar switching between the two conformational domains, B-like C2′-endo and A-like C3′-endo. The effect of sugar puckering on the sequence-dependent protein–DNA interactions has not been studied systematically, however. Here, we analyzed the structural role of A-like nucleotides in 156 protein–DNA complexes solved by X-ray crystallography and NMR. To this end, a new algorithm was developed to distinguish interactions in the minor groove from those in the major groove, and to calculate the solvent-accessible surface areas in each groove separately. Based on this approach, we found a striking difference between the sets of amino acids interacting with B-like and A-like nucleotides in the minor groove. Polar amino acids mostly interact with B-nucleotides, while hydrophobic amino acids interact extensively with A-nucleotides (a hydrophobicity–structure correlation). This tendency is consistent with the larger exposure of hydrophobic surfaces in the case of A-like sugars. Overall, the A-like nucleotides aid in achieving protein-induced fit in two major ways. First, hydrophobic clusters formed by several consecutive A-like sugars interact cooperatively with the non-polar surfaces in proteins. Second, the sugar switching occurs in large kinks promoted by direct protein contact, predominantly at the pyrimidine–purine dimeric steps. The sequence preference for the B-to-A sugar repuckering, observed for pyrimidines, suggests that the described DNA deformations contribute to specificity of the protein–DNA recognition in the minor groove

    Shape-dependent designability studies of lattice proteins

    Get PDF
    One important problem in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices together with reduced amino acid alphabet models have been examined extensively and have lead to interesting results that shed some light on evolutionary relationship among proteins. Here we perform designability studies on the 2D square lattice and explore the effects of variable overall shapes on protein designability using a binary hydrophobic-polar (HP) amino acid alphabet. Because we rely on a simple energy function that counts the total number of H-H interactions between non-sequential residues, we restrict our studies to protein shapes that have the same number of residues and also a constant number of non-bonded contacts. We have found that there is a marked difference in the designability between various protein shapes, with some of them accounting for a significantly larger share of the total foldable sequences

    Structural interpretation of protein-protein interaction network

    Get PDF
    Background Currently a huge amount of protein-protein interaction data is available from high throughput experimental methods. In a large network of protein-protein interactions, groups of proteins can be identified as functional clusters having related functions where a single protein can occur in multiple clusters. However experimental methods are error-prone and thus the interactions in a functional cluster may include false positives or there may be unreported interactions. Therefore correctly identifying a functional cluster of proteins requires the knowledge of whether any two proteins in a cluster interact, whether an interaction can exclude other interactions, or how strong the affinity between two interacting proteins is. Methods In the present work the yeast protein-protein interaction network is clustered using a spectral clustering method proposed by us in 2006 and the individual clusters are investigated for functional relationships among the member proteins. 3D structural models of the proteins in one cluster have been built – the protein structures are retrieved from the Protein Data Bank or predicted using a comparative modeling approach. A rigid body protein docking method (Cluspro) is used to predict the protein-protein interaction complexes. Binding sites of the docked complexes are characterized by their buried surface areas in the docked complexes, as a measure of the strength of an interaction. Results The clustering method yields functionally coherent clusters. Some of the interactions in a cluster exclude other interactions because of shared binding sites. New interactions among the interacting proteins are uncovered, and thus higher order protein complexes in the cluster are proposed. Also the relative stability of each of the protein complexes in the cluster is reported. Conclusions Although the methods used are computationally expensive and require human intervention and judgment, they can identify the interactions that could occur together or ones that are mutually exclusive. In addition indirect interactions through another intermediate protein can be identified. These theoretical predictions might be useful for crystallographers to select targets for the X-ray crystallographic determination of protein complexes

    Potentials 'R'Us web-server for protein energy estimations with coarse-grained knowledge-based potentials

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge-based potentials have been widely used in the last 20 years for fold recognition, protein structure prediction from amino acid sequence, ligand binding, protein design, and many other purposes. However generally these are not readily accessible online.</p> <p>Results</p> <p>Our new knowledge-based potential server makes available many of these potentials for easy use to automatically compute the energies of protein structures or models supplied. Our web server for protein energy estimation uses four-body potentials, short-range potentials, and 23 different two-body potentials. Users can select potentials according to their needs and preferences. Files containing the coordinates of protein atoms in the PDB format can be uploaded as input. The results will be returned to the user's email address.</p> <p>Conclusions</p> <p>Our Potentials 'R'Us server is an easily accessible, freely available tool with a web interface that collects all existing and future protein coarse-grained potentials and computes energies of multiple structural models.</p

    Fold-specific sequence scoring improves protein sequence matching

    Get PDF
    Background Sequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information. Results We obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology. Conclusions We show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs
    corecore