12 research outputs found

    Scale-free behaviour of amino acid pair interactions in folded proteins

    Get PDF
    The protein structure is a cumulative result of interactions between amino acid residues interacting with each other through space and/or chemical bonds. Despite the large number of high resolution protein structures, the "protein structure code" has not been fully identified. Our manuscript presents a novel approach to protein structure analysis in order to identify rules for spatial packing of amino acid pairs in proteins. We have investigated 8706 high resolution non-redundant protein chains and quantified amino acid pair interactions in terms of solvent accessibility, spatial and sequence distance, secondary structure, and sequence length. The number of pairs found in a particular environment is stored in a cell in an 8 dimensional data tensor. When plotting the cell population against the number of cells that have the same population size, a scale free organization is found. When analyzing which amino acid paired residues contributed to the cells with a population above 50, pairs of Ala, Ile, Leu and Val dominate the results. This result is statistically highly significant. We postulate that such pairs form "structural stability points" in the protein structure. Our data shows that they are in buried α-helices or β-strands, in a spatial distance of 3.8-4.3Å and in a sequence distance >4 residues. We speculate that the scale free organization of the amino acid pair interactions in the 8D protein structure combined with the clear dominance of pairs of Ala, Ile, Leu and Val is important for understanding the very nature of the protein structure formation. Our observations suggest that protein structures should be considered as having a higher dimensional organization

    Hyperdimensional Analysis of Amino Acid Pair Distributions in Proteins

    Get PDF
    Our manuscript presents a novel approach to protein structure analyses. We have organized an 8-dimensional data cube with protein 3D-structural information from 8706 high-resolution non-redundant protein-chains with the aim of identifying packing rules at the amino acid pair level. The cube contains information about amino acid type, solvent accessibility, spatial and sequence distance, secondary structure and sequence length. We are able to pose structural queries to the data cube using program ProPack. The response is a 1, 2 or 3D graph. Whereas the response is of a statistical nature, the user can obtain an instant list of all PDB-structures where such pair is found. The user may select a particular structure, which is displayed highlighting the pair in question. The user may pose millions of different queries and for each one he will receive the answer in a few seconds. In order to demonstrate the capabilities of the data cube as well as the programs, we have selected well known structural features, disulphide bridges and salt bridges, where we illustrate how the queries are posed, and how answers are given. Motifs involving cysteines such as disulphide bridges, zinc-fingers and iron-sulfur clusters are clearly identified and differentiated. ProPack also reveals that whereas pairs of Lys residues virtually never appear in close spatial proximity, pairs of Arg are abundant and appear at close spatial distance, contrasting the belief that electrostatic repulsion would prevent this juxtaposition and that Arg-Lys is perceived as a conservative mutation. The presented programs can find and visualize novel packing preferences in proteins structures allowing the user to unravel correlations between pairs of amino acids. The new tools allow the user to view statistical information and visualize instantly the structures that underpin the statistical information, which is far from trivial with most other SW tools for protein structure analysis

    Number of amino acid pairs containing each specific amino acid residue as a function of cell rank.

    No full text
    <p>The residues that have a large number of links to other residues are underlined in red (Ala, Ile, Leu and Val).</p

    Scale-Free Behaviour of Amino Acid Pair Interactions in Folded Proteins - Figure 5

    No full text
    <p>Natural occurrence of amino acid residues in proteins (A), occurrence of amino acid pairs containing a particular amino acid residue retrieved from the 8D matrix (B) and occurrence of amino acid pairs containing a particular amino acid residue retrieved from the randomized reference 8D matrix (C).</p

    Visualizing amino acid pairs containing Ala, Ile, Leu and Val residues.

    No full text
    <p>Crystal structure of the N-heptad repeat of HIV-1 gp41 mimetic 5-helix complexed with two antibody fragments (3MA9.pdb). Amino acid pairs containing Ala, Ile, Leu or Val residues are highlighted in yellow and as CPK. Alpha-helices are colored red and beta-sheets green. The three different chains are displayed: A (HIV-1 gp41 5-helix), L and H (Fab fragments).</p

    Amino acid pair distribution.

    No full text
    <p>(A) Distribution of the amino acid pair containing residues for rank 1 cells (1.15*10<sup>6</sup> amino acid pairs); (B) Distribution of the amino acid pair containing residues for rank ≥50 cells (1.07*10<sup>6</sup> amino acid pairs). Each amino acid is represented by its one letter code; (C) 2D histograms of Euclidian distance between the amino acids in a pair <i>vs</i> solvent accessibility seen for rank 1 cells; (D) 2D histograms of Euclidian distance between the amino acids in a pair <i>vs</i> solvent accessibility seen for rank ≥50 cells.</p

    The fold matrix used in the present study is 8 dimensional.

    No full text
    <p>Its content can be projected onto any subspace one may define. The cell content (rank or number of amino acid pairs) is plotted against the frequency that such a rank is found in a log-log plot: in (A) is shown the log-log plot for the full 8 dimensional fold matrix. A linear fit to the points resulted in a slope of −2.26±0.05 and an intercept at 14.3, (B) depicts the 2-dimensional amino acid type subspace data, (C) the 3 dimensional subspace consisting of 2 amino acid types and 1 solvent accessibility dimension, and (D) the 4 dimensional subspace consisting of 2 amino acid types, 1 solvent accessibility and 1 distance dimension.</p
    corecore