thesis

Efficient comprehensive scoring of docked proteincomplexes - a machine learning approach

Abstract

Biological systems and processes rely on a complex network of molecular interactions. The association of biological macromolecules is a fundamental biochemical phenomenon and an unsolved theoretical problem crucial for the understanding of complex living systems. The term protein-protein docking describes the computational prediction of the assembly of protein complexes from the individual subunits. Docking algorithms generally produce a large number of putative protein complexes. In most cases, some of these conformations resemble the native complex structure within an acceptable degree of structural similarity. A major challenge in the field of docking is to extract the near-native structure(s) out of this considerably large pool of solutions, the so called scoring or ranking problem. It has been the aim of this work to develop methods for the efficient and accurate detection of near-native conformations in the scoring or ranking process of docked protein-protein complexes. A series of structural, chemical, biological and physical properties are used in this work to score docked protein-protein complexes. These properties include specialised energy functions, evolutionary relationship, class specific residue interface propensities, gap volume, buried surface area, empiric pair potentials on residue and atom level as well as measures for the tightness of fit. Efficient comprehensive scoring functions have been developed using probabilistic Support Vector Machines in combination with this array of properties on the largest currently available protein-protein docking benchmark. The established scoring functions are shown to be specific for certain types of protein-protein complexes and are able to detect near-native complex conformations from large sets of decoys with high sensitivity. The specific complex classes are Enzyme-Inhibitor/Substrate complexes, Antibody-Antigen complexes and a third class denoted as "Other" complexes which holds all test cases not belonging to either of the two previous classes. The three complex class specific scoring functions were tested on the docking results of 99 complexes in their unbound form for the above mentioned categories. Defining success as scoring a 'true' result with a p-value of better than 0.1, the scoring schemes were found to be successful in 93%, 78% and 63% of the examined cases, respectively. The ranking of near-native structures can be drastically improved, leading to a significant enrichment of near-native complex conformations in the top ranks. It could be shown that the developed scoring schemes outperform five other previously published scoring functions

    Similar works