Improving Decoy Databases for Protein Folding Algorithms

Abstract

Predicting protein structures and simulating protein folding motions are two of the most important problems in computational biology today. Modern folding simulation methods rely on a scoring function which attempts to distinguish the native structure (the most energetically stable 3D structure) from one or more non-native structures. Decoy databases are collections of non-native structures that are widely used to test and verify these scoring functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and/or removing redundant structures. We test our approach on 13 different decoy databases of varying size and type and show significant improvement across a variety of metrics. The most improvement comes from the addition of novel structures indicating that our improved databases have more informative structures that are more likely to fool scoring functions. We also test our improved databases on a popular modern scoring function. We show that they contain a greater number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions. This work can aid the development and testing of better scoring functions, which in turn, will improve the quality of protein folding simulations

    Similar works