4 research outputs found

    Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory

    Get PDF
    DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position-specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset

    Specificity Determination by paralogous winged helix-turn-helix transcription factors

    Get PDF
    Transcription factors (TFs) localize to regulatory regions throughout the genome, where they exert physical or enzymatic control over the transcriptional machinery and regulate expression of target genes. Despite the substantial diversity of TFs found across all kingdoms of life, most belong to a relatively small number of structural families characterized by homologous DNA-binding domains (DBDs). In homologous DBDs, highly-conserved DNA-contacting residues define a characteristic ‘recognition potential’, or the limited sequence space containing high-affinity binding sites. Specificity-determining residues (SDRs) alter DNA binding preferences to further delineate this sequence space between homologous TFs, enabling functional divergence through the recognition of distinct genomic binding sites. This thesis explores the divergent DNA-binding preferences among dimeric, winged helix-turn-helix (wHTH) TFs belonging to the OmpR sub-family. As the terminal effectors of orthogonal two-component signaling pathways in Escherichia coli, OmpR paralogs bind distinct genomic sequences and regulate the expression of largely non-overlapping gene networks. Using high-throughput SELEX, I discover multiple sources of variation in DNA-binding, including the spacing and orientation of monomer sites as well as a novel binding ‘mode’ with unique half-site preferences (but retaining dimeric architecture). Surprisingly, given the diversity of residues observed occupying positions in contact with DNA, there are only minor quantitative differences in sequence-specificity between OmpR paralogs. Combining phylogenetic, structural, and biological information, I then define a comprehensive set of putative SDRs, which, although distributed broadly across the protein:DNA interface, preferentially localize to the major groove of the DNA helix. Direct specificity profiling of SDR variants reveals that individual SDRs impact local base preferences as well as global structural properties of the protein:DNA complex. This study demonstrates clearly that OmpR family TFs possess multiple ‘axes of divergence’, including base recognition, dimeric architecture, and structural attributes of the protein:DNA complex. It also provides evidence for a common structural ‘code’ for DNA-binding by OmpR homologues, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Importantly, well-characterized genomic binding sites for many of the TFs in this study diverge substantially from the presented de novo models, and it is unclear how mutations may affect binding in more complex environments. Further analysis using native sequences is required to build combined models of cis- and trans-evolution of two-component regulatory networks

    Deconstructing Ihf-Mediated Inhibition of the Complex acs Promoter

    Get PDF
    acs encodes a high affinity enzyme that permits survival during carbon starvation. As befits a survival gene, its transcription is subject to complex regulation. Previously, the Wolfe lab reported that CRP activates acs transcription by binding tandem DNA sites located upstream of the major acsP2 promoter and that the nucleoid protein IHF binds three specific sites located just upstream. The most proximal site (IHF III) exhibits reduced transcription compared to the full-length promoter or to a construct lacking all three IHF sites. The goal of my research was to understand how IHF III inhibits CRP-dependent acs transcription. First, I helped define the minimal system required for this IHF-dependent inhibition, showing it requires the promoter-distal CRP site and an amino acid residue located within a surface determinant of CRP that interacts with RNAP. Surprisingly, for a Class III promoter, disruption of this surface determinant caused significant changes in the activity and structure of both the full-length promoter and the construct with the single proximal IHF site. My collaborator, Dr. Bianca Sclavi (Laboratoire de Biotechnologies et Pharmacologie génétique Appliquée, Paris, France) showed that occupancy of IHF III mediates formation of a stalled unproductive transcription complex. This work was published in Molecular Microbiology. I furthered this research, obtaining evidence that IHF III is actually a composite site consisting of two overlapping IHF sites that sit on opposing faces of the DNA helix. This composite appears to behave as a transcriptional regulatory switch. If IHF occupies one site, acs transcription occurs. If IHF occupies the opposing site, acs transcription is inhibited. Which site becomes occupied appears to involve occupancy of IHF II, which is located just upstream of IHF III. This work demonstrates that the typical textbook view bacterial transcription is overly simplistic. In fact, bacterial transcription can be quite complex
    corecore