Search CORE

64 research outputs found

A critical analysis of computational protein design with sparse residue interaction graphs

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date: 01/03/2017
Field of study

<div>Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies.</div

Directory of Open Access Journals

FigShare

Sequence correlation between designed mutant and wild type.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

The table shows mutable residues at which the sparse and full GMEC predict different amino acid identities: one predicted the amino acid identity of the more stable designed mutant, and the other predicted the amino acid identity of the less stable wild type. The amino acid identity of the designed mutant is in bold. The wild-type amino acid identity is not in bold.</p

FigShare

Omitting key high-energy long-range interactions changes the sequence of the GMEC.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

The table shows the one or two highest energy interactions omitted by a distance cutoff of 7 Å. The omission of these edges alone is sufficient to change the sequence of the GMEC computed using a sparse residue interaction graph.</p

FigShare

Sequence differences with full vs. sparse residue interaction graphs: hydrogen bond is disrupted when long-range interactions are omitted.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

Comparison between the sequences of the full and sparse GMEC for the surface design of domain of pneumococcal histidine triad A protein (PDB id: 2CS7) are shown. (a) Mutable residues of the sparse GMEC. Protein backbone is shown in black. Residues 17 and 32 are shown in cyan. With distance cutoff δ = 8 Å, the interactions between red and cyan residues are eliminated in the sparse residue interaction graph. (b) Solid brown lines indicate residues interacting with the cyan residues in the sparse residue interaction graph. Amino acids at residues 17 and 32 from the sparse GMEC are shown in cyan. (c) Residues 17 and 32 of the sparse GMEC. (d) Residues 17 and 32 of the full GMEC. Note that the hydrogen bond between residues 17 and 32 (Lys:Glu) in the full GMEC (d) is lost in the sparse GMEC (c), where the side chain of residue 17 (Arg) forms hydrogen bonds with nearby backbone atoms. Hydrogen bonds are shown as green dotted-pillows that indicate the overlap between the vdW spheres of the hydrogen and the acceptor atom, generated using Probe [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005346#pcbi.1005346.ref069" target="_blank">69</a>].</p

FigShare

Sparse residue interaction graphs introduce differences in energy, conformation, and sequence of the GMEC.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

Data shown for 21 boundary design problems, for each of which Sparse A* was run with the following cutoffs: distance cutoff δ = 8 Å, δ = 7 Å, energy cutoff α = 0.1 kcal/mol and α = 0.2 kcal/mol. Number of mutable residues in each design problem ranged from 10-20. (a) Number of design problems where full GMEC and sparse GMEC are identical (purple), and where the sequences of the full GMEC and sparse GMEC are identical (cyan). The total number of boundary design problems (21) is indicated by the horizontal red line. (b) Percentage of edges deleted from the residue interaction graph vs. the full energy difference between full GMEC and sparse GMEC. (c) Number of residues with different amino acids between the full GMEC and the sparse GMEC. y-axis value of 0 indicates that the sequences of the full GMEC and the sparse GMEC are identical.</p

FigShare

The rank of the full GMEC is small for retrospective design problems, and the 1000 lowest-energy conformations can be enumerated quickly.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

The rank of the full GMEC is small for retrospective design problems, and the 1000 lowest-energy conformations can be enumerated quickly.</p

FigShare

Overview of the 136 protein design test problems on 62 proteins studied in this paper.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

Different problems required different amounts of resources. (a) 62 core protein design problems, (b) 46 boundary design problems, and (c) 28 surface design problems. Design problems where A* returned the full GMEC and Sparse A* returned the sparse and the full GMEC are shown in green. Design problems where A* ran out of memory (30GB) before returning the full GMEC and Sparse A* returned the sparse GMEC are shown in blue. Design problems where both A* and Sparse A* ran out of memory (30GB) before returning any conformation are shown in red.</p

FigShare

Example of a sparse residue interaction graph.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

(a) Cobrotoxin protein (PDB id: 1V6P) with the wild-type side chains of the 8 core mutable residues shown in cyan. (b) Design problem in (a) represented as a full residue interaction graph where all pairs of residues interact. (c) Design problem in (a) represented as a sparse residue interaction graph using a distance cutoff of δ = 8 Å.</p

FigShare

The full GMEC is usually within 30 conformations of the sparse GMEC for boundary designs.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

Rank of the full GMEC in the gap-free list of conformations generated by Sparse A* for 21 boundary protein design problems, with distance cutoffs δ = 7 Å and δ = 8 Å, and energy cutoffs α = 0.1 kcal/mol and α = 0.2 kcal/mol. Rank 1 indicates that the full and the sparse GMEC were identical.</p

FigShare

Actual sparse energy difference between the full and sparse GMEC is much smaller than the theoretical energy bound.

Author: Bruce R. Donald (145573)
Ivelin S. Georgiev (688784)
Jonathan D. Jou (3878272)
Swati Jain (569859)
Publication venue
Publication date
Field of study

Bounds on the sparse energy difference (as calculated by Lemma 1 in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005346#pcbi.1005346.s001" target="_blank">S1 Text</a>) vs. the actual full energy difference between the full GMEC and sparse GMEC for distance cutoff δ = 7 Å (blue) and energy cutoff α = 0.2 kcal/mol (red). (a) 62 core protein design problems, (b) 21 boundary protein design problems, (c) 12 surface protein design problems.</p

FigShare