Background: The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations.Results: To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality.Conclusions: The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.