1,289 research outputs found

    Coding limits on the number of transcription factors

    Get PDF
    Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

    Protein-DNA Recognition Models for the Homeodomain and C2H2 Zinc Finger Transcription Factor Families

    Get PDF
    Transcription factors: TFs) play a central role in the gene regulatory network of each cell. They can stimulate or inhibit transcription of their target genes by binding to short, degenerate DNA sequence motifs. The goal of this research is to build improved models of TF binding site recognition. This can facilitate the determination of regulatory networks and also allow for the prediction of binding site motifs based only on the TF protein sequence. Recent technological advances have rapidly expanded the amount of quantitative TF binding data available. PBMs: Protein Binding Microarrays) have recently been implemented in a format that allows all 10mers to be assayed in parallel. There is now PBM data available for hundreds of transcription factors. Another fairly recent technique for determining the binding preference of a TF is an in vivo bacterial one-hybrid assay: B1H). In this approach a TF is expressed in E. coli where it can be used to select strong binding sites from a library of randomized sites located upstream of a weak promoter, driving expression of a selectable gene. When coupled with high throughput sequencing and a newly developed analysis method, quantitative binding data can be obtained. In the last few years, the binding specificities of hundreds of TFs have been determined using B1H. The two largest eukaryotic transcription factor families are the zf-C2H2 and homeodomain TF families. Newly available PBM and B1H specificity models were used to develop recognition models for these two families, with the goal of being able to predict the binding specific of a TF from its protein sequence. We developed a feature selection method based on adjusted mutual information that automatically recovers nearly all of the known key residues for the homeodomain and zf-C2H2 families. Using those features we find that, for both families, random forest: RF) and support vector machine: SVM) based recognition models outperform the nearest neighbor method, which has previously been considered the best method

    Profiling the DNA-binding specificities of engineered Cys2His2 zinc finger domains using a rapid cell-based method

    Get PDF
    The C2H2 zinc finger is the most commonly utilized framework for engineering DNA-binding domains with novel specificities. Many different selection strategies have been developed to identify individual fingers that possess a particular DNA-binding specificity from a randomized library. In these experiments, each finger is selected in the context of a constant finger framework that ensures the identification of clones with a desired specificity by properly positioning the randomized finger on the DNA template. Following a successful selection, multiple zinc-finger clones are typically recovered that share similarities in the sequences of their DNA-recognition helices. In principle, each of the clones isolated from a selection is a candidate for assembly into a larger multi-finger protein, but to date a high-throughput method for identifying the most specific candidates for incorporation into a final multi-finger protein has not been available. Here we describe the development of a specificity profiling system that facilitates rapid and inexpensive characterization of engineered zinc-finger modules. Moreover, we demonstrate that specificity data collected using this system can be employed to rationally design zinc fingers with improved DNA-binding specificities

    Transcriptional Regulation: a Genomic Overview

    Get PDF
    The availability of the Arabidopsis thaliana genome sequence allows a comprehensive analysis of transcriptional regulation in plants using novel genomic approaches and methodologies. Such a genomic view of transcription first necessitates the compilation of lists of elements. Transcription factors are the most numerous of the different types of proteins involved in transcription in eukaryotes, and the Arabidopsis genome codes for more than 1,500 of them, or approximately 6% of its total number of genes. A genome-wide comparison of transcription factors across the three eukaryotic kingdoms reveals the evolutionary generation of diversity in the components of the regulatory machinery of transcription. However, as illustrated by Arabidopsis, transcription in plants follows similar basic principles and logic to those in animals and fungi. A global view and understanding of transcription at a cellular and organismal level requires the characterization of the Arabidopsis transcriptome and promoterome, as well as of the interactome, the localizome, and the phenome of the proteins involved in transcription

    An improved predictive recognition model for Cys2-His2 zinc finger proteins

    Get PDF
    Cys2-His2 zinc finger proteins (ZFPs) are the largest family of transcription factors in higher metazoans. They also represent the most diverse family with regards to the composition of their recognition sequences. Although there are a number of ZFPs with characterized DNA-binding preferences, the specificity of the vast majority of ZFPs is unknown and cannot be directly inferred by homology due to the diversity of recognition residues present within individual fingers. Given the large number of unique zinc fingers and assemblies present across eukaryotes, a comprehensive predictive recognition model that could accurately estimate the DNA-binding specificity of any ZFP based on its amino acid sequence would have great utility. Toward this goal, we have used the DNA-binding specificities of 678 two-finger modules from both natural and artificial sources to construct a random forest-based predictive model for ZFP recognition. We find that our recognition model outperforms previously described determinant-based recognition models for ZFPs, and can successfully estimate the specificity of naturally occurring ZFPs with previously defined specificities

    Characterization and design of C2H2 zinc finger proteins as custom DNA binding domains

    Get PDF
    As the storage medium for the source code of life, DNA is fundamentally linked to all cellular processes. Nature employs hundreds of sequence-specific DNA binding proteins as transcription factors and repressors to regulate the flow of genetic expression and replication. By adapting these DNA-binding domains to target desired genome locations, they can be harnessed to treat diseases by regulating genes and repairing diseased gene sequences. The C2H2 zinc finger motif is perhaps the most promising and versatile DNA binding framework. Each C2H2 zinc finger domain (module) is capable of recognizing approximately three adjacent nucleotide bases in standard B form DNA. Through directed mutagenesis, novel zinc finger modules (ZFMs) can be selected for most of the 64 possible DNA triplets. By assembling multiple ZFMs with the appropriate linkers, zinc finger proteins (ZFPs) can be generated to specifically bind extended DNA sequence motifs. Several methods of varying complexity are currently available for ZFP engineering. ZFPs generated from the relatively simple modular design method often fail to function in vivo. Those generated using the most reliable module subsets, those recognizing triplets with a 5\u27 guanine (GNN), only function successfully only an estimated 50% of the time, while modularly assembled ZFPs comprising primarily non-GNN modules rarely function in vivo. These low success rates are extremely problematic for applications requiring multiple ZFPs that target adjacent sequence motifs. More complex ZFP engineering approaches provide enhanced success rates, as compared to modular design, with the drawback that they are also more labor intensive and require additional biological expertise. In this research we developed and engineered novel ZFPs, analyzed characteristics of functional custom zinc finger proteins and their targets, formulated algorithms predictive of ZFP success for both modular assembly and OPEN (Oligomerized Pool Engineering) selection methods, and generated a web-based server and software tools to aid others in the successful application of this technology

    Targeting of proteins to chromatin in Drosophila melanogaster

    Get PDF
    Dosage compensation of sex chromosomes in Drosophila melanogaster is an excellent model system to study various aspects of targeting of protein factors to chromatin. Dosage compensation prevents male lethality by up regulating transcription from the single male X chromosome in the ~2 fold range to match the two active X chromosomes in females [reviewed in e.g. (Ferrari et al., 2014; Kuroda et al., 2016; Samata and Akhtar, 2018)]. This up regulation is facilitated by the male specific lethal (MSL) dosage compensation complex (DCC). The DCC binds selectively to ~300 high affinity sites (HAS) on the X chromosome, containing a low complexity GAGA rich sequence motif, the MSL recognition element (MRE) (Alekseyenko et al., 2008; Straub et al., 2008). However, the DCC neglects thousands of other similar sequences in the genome outside of HAS. The DNA binding subunit MSL2 alone can enrich X chromosomal MREs in vitro, although MSL2 misses most MREs within HAS (Villa et al., 2016). The Chromatin Linked Adaptor for MSL Proteins (CLAMP) binds thousands of MREs genome wide and contributes to DCC targeting to HAS (Kaye et al., 2018; Soruco et al., 2013). The role of CLAMP in facilitating MSL2 targeting to HAS was investigated by several approaches. Monitoring MSL2 chromatin binding in vivo by chromatin immunoprecipitation with high throughput sequencing (ChIP seq) showed the requirement of CLAMP for HAS targeting. Next, the interplay between CLAMP and MSL2 in genome wide in vitro DNA binding was studied by DNA immunoprecipitation with high throughput sequencing (DIP seq) (Gossett and Lieb, 2008; Liu et al., 2005; Villa et al., 2016). The data revealed mutual recruitment of both factors to each other’s binding sites and cooperative binding to novel sites. This DNA binding cooperativity extended each other’s binding repertoire to facilitate robust binding of MREs located within HAS, although increased binding to other non functional sites was observed. Both factors interacted directly with each other in co IP experiments, providing an explanation for cooperative DNA binding. Whether CLAMP and MSL2 are required for keeping HAS nucleosome free was studied by assay for transposase accessibly chromatin with high throughput sequencing (ATAC seq) (Buenrostro et al., 2013; Buenrostro et al., 2015). Both factors cooperate to stabilize each other’s binding and to compete with nucleosome positioning at HAS. After successful binding of the DCC to HAS, it interacts with neighboring target genes, which are marked by trimethylation of histone H3K36 (H3K36me3). There, the DCC catalyzes acetylation of H4K16 (H4K16ac) to boost transcription (Akhtar and Becker, 2000; Gelbart et al., 2009; Larschan et al., 2007; Prestel et al., 2010). The DCC employs the chromosome 3D organization, which seems to be invariant between males and females, to transfer from HAS to active genes (Ramirez et al., 2015; Ulianov et al., 2016). The contribution of HAS to the chromosome interaction network was studied by using different chromosome conformation capture techniques. Hi C analysis on sex sorted embryos showed that, H4K16ac and H3K36me3 correlate well with the active compartments (Sexton et al., 2012). Interestingly, compartment switching on the X chromosome between males and females was correlated with H4K16ac and therefore attributed to dosage compensation. The involvement of the Pioneering sites on the X (PionX), a special sub-class of HAS, in chromosome architecture was studied by high resolution 4C seq in male and female cells. Chromosomal segments containing PionX made frequent contact with many loci within the active compartment and even looped over large domains of the inactive compartment (Ghavi-Helm et al., 2014). These long range interactions between PionX with other PionX/HAS were more robust in males compared to females, indicating that the dosage compensation machinery reinforced them. Moreover, de novo induction of DCC assembly in female cells showed that the DCC uses long range interaction within the active compartment to transfer from PionX to target genes marked by H3K36me3 for up regulation of transcription. The chromosomal kinase JIL 1, which catalyzes phosphorylation of histone H3S10, localizes also to actively transcribed genes marked by H3K36me3 and is two fold enriched on the male X chromosome (Jin et al., 2000; Regnard et al., 2011; Wang et al., 2001). JIL 1 is implicated in maintaining overall chromosome organization and preventing the spreading of heterochromatin into the euchromatic part of the X chromosome in both sexes (Cai et al., 2014; Ebert et al., 2004; Jin et al., 1999). Furthermore, JIL 1 localizes to the non LTR retrotransposon arrays of the telomeres to positively regulate their expression (Andreyeva et al., 2005; Silva-Sousa and Casacuberta, 2013; Silva-Sousa et al., 2012). The role of JIL 1 in regulating gene expression was studied using various methods. JIL 1 formed a stable complex with the novel PWWP domain containing protein, JIL 1 Anchoring and Stabilizing Protein (JASPer). The JIL 1 JASPer (JJ) complex specifically enriched H3K36me3 modified nucleosomes in vitro via JASPer’s PWWP domain from a nucleosome library containing 115 different nucleosome types. Consistently, ChIP seq experiments showed that the JJ complex localizes to H3K36me3 chromatin at active gene bodies and at telomeric transposons in vivo. As previously described, the JJ complex is also enriched on the male X chromosome relative to autosomes. Loss of JIL 1 resulted in loss of JASPer enrichment, a small increase in H3K9me2 and a decrease in H4K16ac on the X chromosome shown by spike in ChIP seq. Gene expression analysis by RNA seq showed that the JJ complex positively regulates expression of genes, in particular of genes from the male X chromosome, and of telomeric transposons. Furthermore, the JJ complex associated with the Set1/COMPASS complex and with other remodelling complexes as shown by co IP coupled to mass spectrometry analysis

    Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers

    Get PDF
    Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences

    Getting a Tight Grip on DNA: Optimizing Zinc Fingers for Efficient ZFN-Mediated Gene Editing: A Dissertation

    Get PDF
    The utility of a model organism for studying biological processes is closely tied to its amenability to genome manipulation. Although tools for targeted genome engineering in mice have been available since 1987, most organisms including zebrafish have lacked efficient reverse genetic tools, which has stymied their broad implementation as a model system to study biological processes. The development of zinc finger nucleases (ZFNs) that can create double-strand breaks at desired sites in a genome has provided a universal platform for targeted genome modification. ZFNs are artificial restriction endonucleases that comprise of an array of 3- to 6-C2H2-zinc finger DNA-binding domains fused with the dimeric cleavage domain of the type IIs endonuclease FokI. C2H2-zinc fingers are the most common, naturally occurring DNA-binding domain, and their specificity can be engineered to recognize a variety of DNA sequences providing a strategy for targeting the appended nuclease domain to desired sites in a genome. The utility of ZFNs for gene editing relies on their activity and precision in vivo both of which depend on the generation of ZFPs that bind desired target sites high specificity and affinity. Although various methods are available that allow construction of ZFPs with novel specificities, ZFNs assembled using existing approaches often display negligible in vivo activity, presumably resulting from ZFPs with either low affinity or suboptimal specificity. A root cause of this deficiency is the presence of interfering interactions at the finger-finger interface upon assembly of multiple fingers. In this study we have employed bacterial-one-hybrid (B1H)-based selections to identify two-finger zinc finger units (2F-modules) containing optimized interface residues that can be combined with published finger archives to rapidly yield ZFNs that can target more than 95% of the zebrafish and human protein-coding genes while maintaining a success rate higher than that of ZFNs constructed using available methods. In addition to genome engineering in model organisms, this advancement in ZFN design will aid in the development of ZFN-based therapeutics. In the process of creating this archive, we have undertaken a broader study of zinc finger specificity to better understand fundamental aspects of DNA recognition. In the process we have created the largest protein-DNA interaction dataset for zinc fingers to be described that will facilitate the development of better predictive models of recognition. Ultimately, these predictive models would enable the rational design of synthetic zinc finger proteins for targeted gene regulation or genomic modification, and the prediction of genomic binding sites for naturally occurring zinc finger proteins for the construction of more accurate gene regulatory networks
    corecore