An enhanced bioinformatics tool incorporating the participation of molecular structure as well as sequence in protein DNA recognition is proposed and tested. Boltzmann probability models of sequence-dependent DNA structure from all-atom molecular dynamics simulations were obtained and incorporated into hidden Markov models (HMMs) that can recognize molecular structural signals as well as sequence in protein–DNA binding sites on a genome. The binding of catabolite activator protein (CAP) to cognate DNA sequences was used as a prototype case for implementation and testing of the method. The results indicate that even HMMs based on probabilistic roll/tilt dinucleotide models of sequence-dependent DNA structure have some capability to discriminate between known CAP binding and nonbinding sites and to predict putative CAP binding sites in unknowns. Restricting HMMs to sequence only in regions of strong consensus in which the protein makes base specific contacts with the cognate DNA further improved the discriminatory capabilities of the HMMs. Comparison of results with controls based on sequence only indicates that extending the definition of consensus from sequence to structure improves the transferability of the HMMs, and provides further supportive evidence of a role for dynamical molecular structure as well as sequence in genomic regulatory mechanisms
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.