44 research outputs found

    Computational Protein Design Quantifies Structural Constraints on Amino Acid Covariation

    Get PDF
    <div><p>Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This covariation can arise from multiple factors, including selective pressures for maintaining protein structure, requirements imposed by a specific function, or from phylogenetic sampling bias. Here we employed flexible backbone computational protein design to quantify the extent to which protein structure has constrained amino acid covariation for 40 diverse protein domains. We find significant similarities between the amino acid covariation in alignments of natural protein sequences and sequences optimized for their structures by computational protein design methods. These results indicate that the structural constraints imposed by protein architecture play a dominant role in shaping amino acid covariation and that computational protein design methods can capture these effects. We also find that the similarity between natural and designed covariation is sensitive to the magnitude and mechanism of backbone flexibility used in computational protein design. Our results thus highlight the necessity of including backbone flexibility to correctly model precise details of correlated amino acid changes and give insights into the pressures underlying these correlations.</p></div

    Correlation of amino acid pair propensities between natural and designed covarying pairs.

    No full text
    <p>A) Heat maps of amino acid pair propensities at pairs of positions that are highly covarying in both designed and natural sequences. Red pairs are over-represented at covarying positions and blue pairs are under-represented at covarying positions. The values are shown as Z-scores, which denote the number of standard deviations above or below the mean. B) Correlation of amino acid pair propensity Z-scores between designed and natural sequences. The left plot shows the correlation from flexible backbone design sequences and the right plot shows the correlation from fixed backbone design sequences. A Pearson correlation coefficient (r) is shown for each plot.</p

    Performance of the different sampling strategies for the 12-residue benchmark set.

    No full text
    <p>Performance on all benchmark cases in datasets 1 (A) and 2 (B) is shown in terms of percent sub-Angstrom models (%sA). In addition, the rank of the lowest-scoring sub-Angstrom (<1 Ă… RMSD) model in a set of 500 is given; rank 1 indicates that the lowest-scoring model has a sub-Angstrom RMSD. As reference performance of existing methods, standard KIC <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone.0063090-Mandell1" target="_blank">[4]</a> and CCD <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone.0063090-Wang1" target="_blank">[20]</a> are also shown. Data for both methods were regenerated using the same Rosetta revision as for all other methods (Methods). Torsion-restricted sampling serves as an additional control. Benchmark cases that do not generate high-ranking (< rank 20) sub-Angstrom conformations either with NGK or in torsion-restricted sampling are grayed out. For six cases where sub-Angstrom conformations were only sampled with NGK, but not standard KIC, the individual sampling strategies that help each particular case are highlighted. Due to synergy between the different sampling strategies, the %sA of NGK often is higher than expected from the individual methods. * indicates that the starting structure is a dimer, with residues in the dimerization interface within 10 Ă… of the remodeled segment.</p

    Flow chart of the computational strategy to compare natural and designed amino acid covariation.

    No full text
    <p>For each domain family (the SH3 domain in the example), a crystal structure of the domain is obtained from the Protein Data Bank. This structure is used as input to a protocol that generates a conformational ensemble of protein structures. Each structure in this ensemble is then input to a protocol that designs a low energy sequence consistent with the structure. Amino acid covariation is calculated for every pair of positions in the designed sequences, and the designed covariation is compared to the covariation seen among naturally occurring sequences with the same protein domain.</p

    Energy Function Deficiencies Revealed by Sampling Improvements.

    No full text
    <p>Energy-vs-RMSD plots of the four benchmark cases for which NGK (red) finds alternative, lower-energy conformations far from the native conformation that were not observed when sampling with the CCD method in Rosetta (gray) <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone.0063090-Wang1" target="_blank">[20]</a>: 1cyo (A), 1ede (B), 1tib (C) and 3cla (D). REU, Rosetta-energy units.</p

    Protein domains used in this study.

    No full text
    <p>Forty diverse protein domains were selected from Pfam. This table contains the Pfam information for each domain, the total number of sequences assigned to this domain according to Pfam, the PDB ID of the domain crystal structure used for design, the domain length and the SCOP classification.</p

    Comparison of the Percentage of sub-Angstrom Models Generated by KIC and NGK.

    No full text
    <p>Direct comparison of the percentage of sub-Angstrom models (%sA) between standard KIC and NGK for each of the 45 benchmark cases, grouped into those that are better sampled with NGK (red), those that are better sampled with standard KIC (blue), and those for which %sA did not change much (<±10%, black). Cases that were not at all or very rarely sampled by standard KIC but are more often and consistently found by NGK are specifically highlighted (orange box and bottom panel). %sA and the rank of the lowest-scoring sub-Angstrom model for each individual benchmark case are also given in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone-0063090-g005" target="_blank">Figure 5</a>.</p

    Median Performance Across the 12-Residue Benchmark Sets and Illustration of Synergy.

    No full text
    <p>(A) Barplot showing median percent sub-Angstrom (m%sA) across benchmark sets 1 and 2 for the individual sampling strategies tested here as well as their combination (“all five”), showing a clearly increased percentage of sub-Angstrom models. Error bars are standard deviations from 3 independent simulations (generating 500 models for each of the cases in the dataset, repeated three times). Colors are as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone-0063090-g002" target="_blank">Figure 2</a>. The value for CCD is 0 with this measure. (B) Barplots of “leave-one-out” trials in which all combinations of 4 sampling improvement methods are tested: without Taboo sampling (red, NGK), without Omega sampling (dark green), without Rama2b (green), without Ramp repulsive (dark yellow), and without Ramp rama (dark orange). These data are also provided in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone-0063090-t001" target="_blank">Table 1</a>. (C–F) Energy-vs-RMSD plots of remodeling PDB 1oyc with standard KIC (C), next-generation KIC (D), Rama2b sampling (E) and Ramp repulsive sampling (F). REU, Rosetta-energy units. (G) RMSD distributions for the different methods. Colors are as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063090#pone-0063090-g002" target="_blank">Figure 2</a>, NGK in red. Rama2b and Ramp repulsive sampling both contribute to enabling sampling of sub-Angstrom conformations of the remodeled segment (in 1oyc.pdb), while the other individual strategies do not change the RMSD distribution for this case. Nevertheless, the combined performance in NGK is higher than expected from the individual improvements, indicating synergy. (H) Boxplots of median RMSDs for standard KIC (blue), CCD (gray) and NGK (red), based on the lowest-energy model for each benchmark case. Boxplots show the minimum and maximum among the lowest-scoring RMSDs across the benchmark set (error bars), the 25<sup>th</sup> and 75<sup>th</sup> percentile (box boundaries) as well as the median (thick line). Both kinematic-closure-based methods have lower median RMSDs than CCD. (I) Comparison of the lowest-scoring RMSD for each benchmark case in simulations with standard KIC vs. with NGK. NGK typically achieves lower RMSDs than standard KIC (red), while for some cases KIC achieves lower RMSDs (blue). Cases with an RMSD change <10% are shown in black.</p

    Energetic effects of forcing amino acid covariation onto fixed backbones.

    No full text
    <p>A) Scatter plots of covarying pair energies in the context of fixed or flexible backbones. Each dot represents a pair of positions that was found to be highly covarying in the flexible backbone sequences (Backrub, kT = 0.9) and the natural sequences but not in the fixed backbone sequences. Pairs of amino acids at these positions that were found in flexible backbone designs but not in fixed backbone designs were forced onto fixed backbones taken from X-ray crystal structures and their one and two-body energies were calculated. The left plot shows a comparison of one-body energies and the right plot shows a comparison of two-body energies for these pairs. B) Representative examples of pairs of amino acids that require backbone movements to achieve low-energy interactions. Models from flexible backbone design (Backrub, kT = 0.9) are shown in cyan and models from fixed backbone design are shown in magenta. The top case shows a ring stacking interaction, the middle case shows a hydrogen bonding interaction and the bottom case shows a salt bridge interaction. Red disks represent steric clashes, where the radius and number of the disks is proportional to the magnitude of the clash.</p

    Distinguishing features of natural and designed covarying pairs.

    No full text
    <p>A) Example comparison of natural and designed covariation for an individual protein domain (SH3 domain). Each dot represents an amino acid pair. Dashed red lines indicate the thresholds used to identify pairs as highly covarying (two standard deviations above the mean). The indicated quadrants contain the design-specific, overlap and nature-specific pairs, respectively. B) Box plot of distances between amino acid pairs in the nature-specific, design-specific and overlap sets. Pair distances are measured as the minimum distance between heavy-atoms of two amino acids in the representative crystal structure of the domain. C) Correlation of amino acid pair propensity Z-scores between different sets of covarying pairs. The left plot shows the correlation between design-specific and overlap pairs and the right plot shows the correlation between nature-specific and overlap pairs. A Pearson correlation coefficient (r) is shown for each plot.</p
    corecore