Bayesian models and Markov chain Monte Carlo methods for protein motifs using secondary characteristics

Abstract

Statistical methods have been successfully used to analyze biological sequences. Identifying common local patterns, also called motifs, in multiple protein sequences plays an important role for establishing homology between proteins. Homology is easy to establish when sequences are similar (sharing an identity \u3e 25%). However for distantly-related proteins, current available methods often fail to align motifs. We develop new probability models that utilize the secondary characteristics such as amino acid polarity and predicted secondary structures for profiling protein motifs. Bayesian models and Markov chain Monte Carlo methods are employed to estimate the model parameters, therefore to identify protein motifs in multiple sequences. The extra information brought by the secondary characteristics greatly increase the sensitivity of detecting common local patterns for a group of distantly-related proteins

    Similar works

    Full text

    thumbnail-image

    Available Versions