Towards protein function annotations for matching remote homologs

Abstract

Identifying functional similarities for proteins with low sequence identity and low structure similarity often suffers from high false positives and false negatives results. To improve the functional prediction ability based on the local protein structures, we proposed two different refinement and filtering approaches. We built a statistical model (known as Markov Random Field) to describe protein functional site structure. We also developed filters that consider the local environment around the active sites to remove the false positives. Our experimental results, as evaluated in five sets of enzyme families with less than 40% sequence identity, demonstrated that our methods can obtain more remote homologs that could not be detected by traditional sequence-based methods. At the same time, our method could reduce large amount of random matches. Our methods could improve up to 70% of the functional annotation ability (measured by their Area under the ROC curve) in extended motif method

    Similar works