8 research outputs found

    Identification of Position-Specific Correlations between DNA-Binding Domains and Their Binding Sites. Application to the MerR Family of Transcription Factors

    No full text
    <div><p>The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.</p></div

    Z-score vs. protein column conservation.

    No full text
    <p>Red—significantly correlated pairs. Green—other pairs. Y-axis is the protein positional information content for corresponding pair of columns after weighting and adding pseudocounts. X-axis is the Z-score of a pair.</p

    Heatmap of protein-DNA correlations.

    No full text
    <p>TF positions are along the horizontal axis and at the Logo above. Site positions are along the vertical axis and at Logo on the left. The color denotes the Z-score for a pair of positions with the color palette for significantly correlated pairs in the yellow to red interval, while black through light green colors denoting positions below the significance threshold. Protein side chain—DNA base interactions are shown as stars: blue—hydrogen bonds; red—Van der Waals contacts; yellow—water bridges; green—hydrophobic contacts. Interactions observed in the structures of complexes at least once are shown. Elements of protein secondary structure (from the crystal structure of <i>E. coli</i> CueR—PDB ID 1Q05) are shown at the top.</p

    Occurrence of the pair in the top 32 pairs of the list with fraction of the input being scrambled over 100 iterations.

    No full text
    <p>Occurrence of the pair in the top 32 pairs of the list with fraction of the input being scrambled over 100 iterations.</p

    Phylogenetic tree of TFs from studied subfamilies.

    No full text
    <p>Subfamily branches are colored: CueR—red, MerR—blue, CadR-PbrR—green, CadR-PbrR-like—orange, HMRTR—purple. Sequence Logos represent binding motifs (magenta bars) with −10 and −35 promoter boxes (cyan bars) and 3 flanking positions.</p
    corecore