64 research outputs found

    A critical analysis of computational protein design with sparse residue interaction graphs

    No full text
    <div><p>Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a <i>sparse residue interaction graph</i>, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the <i>sparse GMEC</i>. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the <i>full GMEC</i>. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies.</p></div

    Sequence correlation between designed mutant and wild type.

    No full text
    <p>The table shows mutable residues at which the sparse and full GMEC predict different amino acid identities: one predicted the amino acid identity of the more stable designed mutant, and the other predicted the amino acid identity of the less stable wild type. The amino acid identity of the designed mutant is in bold. The wild-type amino acid identity is not in bold.</p

    Omitting key high-energy long-range interactions changes the sequence of the GMEC.

    No full text
    <p>The table shows the one or two highest energy interactions omitted by a distance cutoff of 7 Ã…. The omission of these edges alone is sufficient to change the sequence of the GMEC computed using a sparse residue interaction graph.</p

    Sequence differences with full vs. sparse residue interaction graphs: hydrogen bond is disrupted when long-range interactions are omitted.

    No full text
    <p>Comparison between the sequences of the full and sparse GMEC for the surface design of domain of pneumococcal histidine triad A protein (PDB id: 2CS7) are shown. (a) Mutable residues of the sparse GMEC. Protein backbone is shown in black. Residues 17 and 32 are shown in cyan. With distance cutoff <i>δ</i> = 8 Å, the interactions between red and cyan residues are eliminated in the sparse residue interaction graph. (b) Solid brown lines indicate residues interacting with the cyan residues in the sparse residue interaction graph. Amino acids at residues 17 and 32 from the sparse GMEC are shown in cyan. (c) Residues 17 and 32 of the sparse GMEC. (d) Residues 17 and 32 of the full GMEC. Note that the hydrogen bond between residues 17 and 32 (Lys:Glu) in the full GMEC (d) is lost in the sparse GMEC (c), where the side chain of residue 17 (Arg) forms hydrogen bonds with nearby backbone atoms. Hydrogen bonds are shown as green dotted-pillows that indicate the overlap between the vdW spheres of the hydrogen and the acceptor atom, generated using Probe [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005346#pcbi.1005346.ref069" target="_blank">69</a>].</p

    Sparse residue interaction graphs introduce differences in energy, conformation, and sequence of the GMEC.

    No full text
    <p>Data shown for 21 boundary design problems, for each of which Sparse A* was run with the following cutoffs: distance cutoff <i>δ</i> = 8 Å, <i>δ</i> = 7 Å, energy cutoff <i>α</i> = 0.1 kcal/mol and <i>α</i> = 0.2 kcal/mol. Number of mutable residues in each design problem ranged from 10-20. (a) Number of design problems where full GMEC and sparse GMEC are identical (purple), and where the sequences of the full GMEC and sparse GMEC are identical (cyan). The total number of boundary design problems (21) is indicated by the horizontal red line. (b) Percentage of edges deleted from the residue interaction graph vs. the full energy difference between full GMEC and sparse GMEC. (c) Number of residues with different amino acids between the full GMEC and the sparse GMEC. <i>y</i>-axis value of 0 indicates that the sequences of the full GMEC and the sparse GMEC are identical.</p

    The rank of the full GMEC is small for retrospective design problems, and the 1000 lowest-energy conformations can be enumerated quickly.

    No full text
    <p>The rank of the full GMEC is small for retrospective design problems, and the 1000 lowest-energy conformations can be enumerated quickly.</p

    Overview of the 136 protein design test problems on 62 proteins studied in this paper.

    No full text
    <p><b>Different problems required different amounts of resources</b>. (a) 62 core protein design problems, (b) 46 boundary design problems, and (c) 28 surface design problems. Design problems where A* returned the full GMEC and Sparse A* returned the sparse and the full GMEC are shown in green. Design problems where A* ran out of memory (30GB) before returning the full GMEC and Sparse A* returned the sparse GMEC are shown in blue. Design problems where both A* and Sparse A* ran out of memory (30GB) before returning any conformation are shown in red.</p

    Example of a sparse residue interaction graph.

    No full text
    <p>(a) Cobrotoxin protein (PDB id: 1V6P) with the wild-type side chains of the 8 core mutable residues shown in cyan. (b) Design problem in (a) represented as a full residue interaction graph where all pairs of residues interact. (c) Design problem in (a) represented as a sparse residue interaction graph using a distance cutoff of <i>δ</i> = 8 Å.</p

    The full GMEC is usually within 30 conformations of the sparse GMEC for boundary designs.

    No full text
    <p>Rank of the full GMEC in the gap-free list of conformations generated by Sparse A* for 21 boundary protein design problems, with distance cutoffs <i>δ</i> = 7 Å and <i>δ</i> = 8 Å, and energy cutoffs <i>α</i> = 0.1 kcal/mol and <i>α</i> = 0.2 kcal/mol. Rank 1 indicates that the full and the sparse GMEC were identical.</p

    Actual sparse energy difference between the full and sparse GMEC is much smaller than the theoretical energy bound.

    No full text
    <p>Bounds on the sparse energy difference (as calculated by Lemma 1 in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005346#pcbi.1005346.s001" target="_blank">S1 Text</a>) vs. the actual full energy difference between the full GMEC and sparse GMEC for distance cutoff <i>δ</i> = 7 Å (blue) and energy cutoff <i>α</i> = 0.2 kcal/mol (red). (a) 62 core protein design problems, (b) 21 boundary protein design problems, (c) 12 surface protein design problems.</p
    • …
    corecore