514 research outputs found

    Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and Rna Binding Sites in Rev Proteins of Hiv-1 and Eiav Agree with Experimental Data

    Get PDF
    Protein-protein and protein nucleic acid interactions are vitally important for a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed machine learning approaches for predicting which amino acids of a protein participate in its interactions with other proteins and/or nucleic acids, using only the protein sequence as input. In this paper, we describe an application of classifiers trained on datasets of well-characterized protein-protein and protein-RNA complexes for which experimental structures are available. We apply these classifiers to the problem of predicting protein and RNA binding sites in the sequence of a clinically important protein for which the structure is not known: the regulatory protein Rev, essential for the replication of HIV-1 and other lentiviruses. We compare our predictions with published biochemical, genetic and partial structural information for HIV-1 and EIAV Rev and with our own published experimental mapping of RNA binding sites in EIAV Rev. The predicted and experimentally determined binding sites are in very good agreement. The ability to predict reliably the residues of a protein that directly contribute to specific binding events - without the requirement for structural information regarding either the protein or complexes in which it participates - can potentially generate new disease intervention strategies.Comment: Pacific Symposium on Biocomputing, Hawaii, In press, Accepted, 200

    Generation and enumeration of compact conformations on the two-dimensional triangular and three-dimensional fcc lattices

    Get PDF
    We enumerated all compact conformations within simple geometries on the two-dimensional (2D) triangular and three-dimensional (3D) face centered cubic (fcc) lattice. These compact conformations correspond mathematically to Hamiltonian paths and Hamiltonian circuits and are frequently used as simple models of proteins. The shapes that were studied for the 2D triangular lattice included m×n role= presentation style= display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative; \u3em×nm×n parallelograms, regular equilateral triangles, and various hexagons. On the 3D fcc lattice we generated conformations for a limited class of skewed parallelepipeds. Symmetries of the shape were exploited to reduce the number of conformations. We compared surface to volume ratios against protein length for compact conformations on the 3D cubic lattice and for a selected set of real proteins. We also show preliminary work in extending the transfer matrix method, previously developed by us for the 2D square and the 3D cubic lattices, to the 2D triangular lattice. The transfer matrix method offers a superior way of generating all conformations within a given geometry on a lattice by completely avoiding attrition and reducing this highly complicated geometrical problem to a simple algebraic problem of matrix multiplication

    How noise in force fields can affect the structural refinement of protein models

    Get PDF
    Structural refinement of predicted models of biological macromolecules using atomistic or coarse-grained molecular force fields having various degree of error are investigated. The goal of this analysis is to estimate what is the probability for designing an effective structural refinement based on computations of conformational energies using force field, and starting from a structure predicted from the sequence (using template-based, or template-free modeling), and refining it to bring the structure into closer proximity to the native state. It is widely believed that it should be possible to develop such a successful structure refinement algorithm by applying an iterative procedure with stochastic sampling and appropriate energy function, which assesses the quality (correctness) of protein decoys. Here an analysis of noise in an artificially introduced scoring function is investigated for a model of an ideal sampling scheme, where the underlying distribution of RMSDs is assumed to be Gaussian. Sampling of the conformational space is performed by random generation of RMSD values. We demonstrate that whenever the random noise in a force field exceeds some level, it is impossible to obtain reliable structural refinement. The magnitude of the noise, above which a structural refinement, on average is impossible, depends strongly on the quality of sampling scheme and a size of the protein. Finally, possible strategies to overcome the intrinsic limitations in the force fields for impacting the development of successful refinement algorithms are discussed

    Immunoglobulin Structure Exhibits Control over CDR Motion

    Get PDF
    Motions of the IgG structure are evaluated using normal mode analysis of an elastic network model to detect hinges, the dominance of low frequency modes, and the most important internal motions. One question we seek to answer is whether or not IgG hinge motions facilitate antigen binding. We also evaluate the protein crystal and packing effects on the experimental temperature factors and disorder predictions. We find that the effects of the protein environment on the crystallographic temperature factors may be misleading for evaluating specific functional motions of IgG. The extent of motion of the antigen binding domains is computed to show their large spatial sampling. We conclude that the IgG structure is specifically designed to facilitate large excursions of the antigen binding domains. Normal modes are shown as capable of com- putationally evaluating the hinge motions and the spatial sampling by the structure. The antigen binding loops and the major hinge appear to behave similarly to the rest of the structure when we consider the dominance of the low frequency modes and the extent of internal motion. The full IgG structure has a lower spectral dimension than individual Fab domains, pointing to more efficient information transfer through the antibody than through each domain. This supports the claim that the IgG structure is specifically constructed to facilitate antigen binding by coupling motion of the antigen binding loops with the large scale hinge motions

    Entropy, Fluctuations, and Disordered Proteins

    Get PDF
    Entropy should directly reflect the extent of disorder in proteins. By clustering structurally related proteins and studying the multiple-sequence-alignment of the sequences of these clusters, we were able to link between sequence, structure, and disorder information. We introduced several parameters as measures of fluctuations at a given MSA site and used these as representative of the sequence and structure entropy at that site. In general, we found a tendency for negative correlations between disorder and structure, and significant positive correlations between disorder and the fluctuations in the system. We also found evidence for residue-type conservation for those residues proximate to potentially disordered sites. Mutation at the disorder site itself appear to be allowed. In addition, we found positive correlation for disorder and accessible surface area, validating that disordered residues occur in exposed regions of proteins. Finally, we also found that fluctuations in the dihedral angles at the original mutated residue and disorder are positively correlated while dihedral angle fluctuations in spatially proximal residues are negatively correlated with disorder. Our results seem to indicate permissible variability in the disordered site, but greater rigidity in the parts of the protein with which the disordered site interacts. This is another indication that disordered residues are involved in protein function

    Analysis of protein dynamics using local-DME calculations

    Get PDF
    Flexibility and dynamics of protein structures are reflected in the B-factors and order parameters obtained experimentally with X-ray crystallography and Nuclear Magnetic Resonance (NMR). Methods such as Normal Mode Analysis (NMA) and Elastic Network Models (ENM) can be used to predict the fluctuations of protein structures for either atomic level or coarse-grained structures. Here, we introduce the Local-Distance Matrix Error (DME), an efficient and simple analytic method to study the fluctuations of protein structures, especially for the ensembles of NMR-determined protein structures. Comparisons with the fluctuations obtained by experiments and other by computations show strong correlations

    Protein sequence entropy is closely related to packing density and hydrophobicity

    Get PDF
    We investigated the correlation between the Shannon information entropy, ‘sequence entropy’, with respect to the local flexibility of native globular proteins as described by inverse packing density. These are determined at each residue position for a total set of 130 query proteins, where sequence entropies are calculated from each set of aligned residues. For the accompanying aggregate set of 130 alignments, a strong linear correlation is observed between the calculated sequence entropy and the corresponding inverse packing density determined at an associated residue position. This region of linearity spans the range of Cα packing densities from 12 to 25 amino acids within a sphere of 9 Å radius. Three different hydrophobicity scales all mimic the behavior of the sequence entropies. This confirms the idea that the ability to accommodate mutations is strongly dependent on the available space and on the propensity for each amino acid type to be buried. Future applications of these types of methods may prove useful in identifying both core and flexible residues within a protein

    Predicting the order in which contacts are broken during single molecule protein stretching experiments

    Get PDF
    We combine two methods to enable the prediction of the order in which contacts are broken under external stretching forces in single molecule experiments. These two methods are Gô-like models and elastic network models. The Gô-like models have shown remarkable success in representing many aspects of protein behavior, including the reproduction of experimental data obtained from atomic force microscopy. The simple elastic network models are often used successfully to predict the fluctuations of residues around their mean positions, comparing favorably with the experimentally measured crystallographic B-factors. The behavior of biomolecules under external forces has been demonstrated to depend principally on their elastic properties and the overall shape of their structure. We have studied in detail the muscle protein titin and green fluorescent protein and tested for ten other proteins. First, we stretch the proteins computationally by performing stochastic dynamics simulations with the Gô-like model. We obtain the force–displacement curves and unfolding scenarios of possible mechanical unfolding. We then use the elastic network model to calculate temperature factors (B-factors) and compare the slowest modes of motion for the stretched proteins and compare them with the predicted order of breaking contacts between residues in the Gô-like model. Our results show that a simple Gaussian network model is able to predict contacts that break in the next time stage of stretching. Additionally, we have found that the contact disruption is strictly correlated with the highest force exerted by the backbone on these residues. Our prediction of bond-breaking agrees well with the unfolding scenario obtained with the Gô-like model. We anticipate that this method will be a useful new tool for interpreting stretching experiments

    Statistical Measures on Residue-Level Protein Structural Properties

    Get PDF
    Background The atomic-level structural properties of proteins, such as bond lengths, bond angles, and torsion angles, have been well studied and understood based on either chemistry knowledge or statistical analysis. Similar properties on the residue-level, such as the distances between two residues and the angles formed by short sequences of residues, can be equally important for structural analysis and modeling, but these have not been examined and documented on a similar scale. While these properties are difficult to measure experimentally, they can be statistically estimated in meaningful ways based on their distributions in known proteins structures. Results Residue-level structural properties including various types of residue distances and angles are estimated statistically. A software package is built to provide direct access to the statistical data for the properties including some important correlations not previously investigated. The distributions of residue distances and angles may vary with varying sequences, but in most cases, are concentrated in some high probability ranges, corresponding to their frequent occurrences in either α-helices or β-sheets. Strong correlations among neighboring residue angles, similar to those between neighboring torsion angles at the atomic-level, are revealed based on their statistical measures. Residue-level statistical potentials can be defined using the statistical distributions and correlations of the residue distances and angles. Ramachandran-like plots for strongly correlated residue angles are plotted and analyzed. Their applications to structural evaluation and refinement are demonstrated. Conclusions With the increase in both number and quality of known protein structures, many structural properties can be derived from sets of protein structures by statistical analyses and data mining, and these can even be used as a supplement to the experimental data for structure determinations. Indeed, the statistical measures on various types of residue distances and angles provide more systematic and quantitative assessments on these properties, which can otherwise be estimated only individually and qualitatively. Their distributions and correlations in known protein structures show their importance for providing insights into how proteins may fold naturally to various residue-level structures

    Prediction of protein secondary structure by mining structural fragment database

    Get PDF
    A new method for predicting protein secondary structure from amino acid sequence has been developed. The method is based on multiple sequence alignment of the query sequence with all other sequences with known structure from the protein data bank (PDB) by using BLAST. The fragments of the alignments belonging to proteins from the PBD are then used for further analysis. We have studied various schemes of assigning weights for matching segments and calculated normalized scores to predict one of the three secondary structures: α-helix, β-sheet, or coil. We applied several artificial intelligence techniques: decision trees (DT), neural networks (NN) and support vector machines (SVM) to improve the accuracy of predictions and found that SVM gave the best performance. Preliminary data show that combining the fragment mining approach with GOR V (Kloczkowski et al, Proteins 49 (2002) 154–166) for regions of low sequence similarity improves the prediction accuracy
    • …
    corecore