35 research outputs found
Atomic-accuracy prediction of protein loop structures through an RNA-inspired ansatz
Consistently predicting biopolymer structure at atomic resolution from
sequence alone remains a difficult problem, even for small sub-segments of
large proteins. Such loop prediction challenges, which arise frequently in
comparative modeling and protein design, can become intractable as loop lengths
exceed 10 residues and if surrounding side-chain conformations are erased. This
article introduces a modeling strategy based on a 'stepwise ansatz', recently
developed for RNA modeling, which posits that any realistic all-atom molecular
conformation can be built up by residue-by-residue stepwise enumeration. When
harnessed to a dynamic-programming-like recursion in the Rosetta framework, the
resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12
residue loop at a significant but achievable cost of thousands of CPU-hours. In
a previously established benchmark, SWA recovers crystallographic conformations
with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC
modeling with a comparable expenditure of computational power. Furthermore, SWA
gives high accuracy results on an additional set of 15 loops highlighted in the
biological literature for their irregularity or unusual length. Successes
include cis-Pro touch turns, loops that pass through tunnels of other
side-chains, and loops of lengths up to 24 residues. Remaining problem cases
are traced to inaccuracies in the Rosetta all-atom energy function. In five
additional blind tests, SWA achieves sub-Angstrom accuracy models, including
the first such success in a protein/RNA binding interface, the YbxF/kink-turn
interaction in the fourth RNA-puzzle competition. These results establish
all-atom enumeration as a systematic approach to protein structure that can
leverage high performance computing and physically realistic energy functions
to more consistently achieve atomic resolution.Comment: Identity of four-loop blind test protein and parts of figures 5 have
been omitted in this preprint to ensure confidentiality of the protein
structure prior to its public releas