25,958 research outputs found

    Discovering properties of new DNA-binding activity of proteins

    Get PDF
    Protein-DNA interactions are an essential feature in the genetic activities of life, and the ability to predict and manipulate such interactions has applications in a wide range of fields. This Thesis presents the methods of modelling the properties of protein-DNA interactions. In particular, it investigates the methods of visualising and predicting the specificity of DNA-binding Cys2His2 zinc finger interaction. The Cys2His2 zinc finger proteins interact via their individual fingers to base pair subsites on the target DNA. Four key residue positions on the a- helix of the zinc fingers make non-covalent interactions with the DNA with sequence specificity. Mutating these key residues generates combinatorial possibilities that could potentially bind to any DNA segment of interest. Many attempts have been made to predict the binding interaction using structural and chemical information, but with only limited success. The most important contribution of the thesis is that the developed model allows for the binding properties of a given protein-DNA binding to be visualised in relation to other protein-DNA combinations without having to explicitly physically model the specific protein molecule and specific DNA sequence. To prove this, various databases were generated, including a synthetic database which includes all possible combinations of the DNA-binding Cys2His2 zinc finger interactions. NeuroScale, a topographic visualisation technique, is exploited to represent the geometric structures of the protein-DNA interactions by measuring dissimilarity between the data points. In order to verify the effect of visualisation on understanding the binding properties of the DNA-binding Cys2His2 zinc finger interaction, various prediction models are constructed by using both the high dimensional original data and the represented data in low dimensional feature space. Finally, novel data sets are studied through the selected visualisation models based on the experimental DNA-zinc finger protein database. The result of the NeuroScale projection shows that different dissimilarity representations give distinctive structural groupings, but clustering in biologically-interesting ways. This method can be used to forecast the physiochemical properties of the novel proteins which may be beneficial for therapeutic purposes involving genome targeting in general

    Structural analysis of the EGR family of transcription factors: Templates for predicitng protein - DNA internations

    Get PDF
    The EGR family of transcription factors is known to be activated in cells exposed to growth factors in a variety of tissues. The overall structure of the family is highly conserved while the amino acid sequence can be quite diverse allowing for a wide array of DNA recognition sequences. Through homology modeling it is possible to reproduce the structure of the DNA binding domain of EGR proteins, which consists of three zinc fingers. It has also been determined through molecular dynamic simulations that most side chains within the domain reach an equilibrium state. However, residues that are essential for DNA binding are seen throughout the simulation as not reaching an equilibrium state, but constantly sampling available conformational space. Furthermore, through cluster analysis the three recognition residues in each zinc finger are found to have side chain conformations that are optimal for DNA recognition. These studies help to show a possible mechanism for zinc finger recognition of DNA and create homology modeled proteins that are able to be used in protein – DNA interaction prediction

    Getting a Tight Grip on DNA: Optimizing Zinc Fingers for Efficient ZFN-Mediated Gene Editing: A Dissertation

    Get PDF
    The utility of a model organism for studying biological processes is closely tied to its amenability to genome manipulation. Although tools for targeted genome engineering in mice have been available since 1987, most organisms including zebrafish have lacked efficient reverse genetic tools, which has stymied their broad implementation as a model system to study biological processes. The development of zinc finger nucleases (ZFNs) that can create double-strand breaks at desired sites in a genome has provided a universal platform for targeted genome modification. ZFNs are artificial restriction endonucleases that comprise of an array of 3- to 6-C2H2-zinc finger DNA-binding domains fused with the dimeric cleavage domain of the type IIs endonuclease FokI. C2H2-zinc fingers are the most common, naturally occurring DNA-binding domain, and their specificity can be engineered to recognize a variety of DNA sequences providing a strategy for targeting the appended nuclease domain to desired sites in a genome. The utility of ZFNs for gene editing relies on their activity and precision in vivo both of which depend on the generation of ZFPs that bind desired target sites high specificity and affinity. Although various methods are available that allow construction of ZFPs with novel specificities, ZFNs assembled using existing approaches often display negligible in vivo activity, presumably resulting from ZFPs with either low affinity or suboptimal specificity. A root cause of this deficiency is the presence of interfering interactions at the finger-finger interface upon assembly of multiple fingers. In this study we have employed bacterial-one-hybrid (B1H)-based selections to identify two-finger zinc finger units (2F-modules) containing optimized interface residues that can be combined with published finger archives to rapidly yield ZFNs that can target more than 95% of the zebrafish and human protein-coding genes while maintaining a success rate higher than that of ZFNs constructed using available methods. In addition to genome engineering in model organisms, this advancement in ZFN design will aid in the development of ZFN-based therapeutics. In the process of creating this archive, we have undertaken a broader study of zinc finger specificity to better understand fundamental aspects of DNA recognition. In the process we have created the largest protein-DNA interaction dataset for zinc fingers to be described that will facilitate the development of better predictive models of recognition. Ultimately, these predictive models would enable the rational design of synthetic zinc finger proteins for targeted gene regulation or genomic modification, and the prediction of genomic binding sites for naturally occurring zinc finger proteins for the construction of more accurate gene regulatory networks

    Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing.

    Get PDF
    Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities

    A new census of protein tandem repeats and their relationship with intrinsic disorder

    Get PDF
    Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence

    Coding limits on the number of transcription factors

    Get PDF
    Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

    A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks

    Get PDF
    Background Transcription regulatory networks are composed of interactions between transcription factors and their target genes. Whereas unicellular networks have been studied extensively, metazoan transcription regulatory networks remain largely unexplored. Caenorhabditis elegans provides a powerful model to study such metazoan networks because its genome is completely sequenced and many functional genomic tools are available. While C. elegans gene predictions have undergone continuous refinement, this is not true for the annotation of functional transcription factors. The comprehensive identification of transcription factors is essential for the systematic mapping of transcription regulatory networks because it enables the creation of physical transcription factor resources that can be used in assays to map interactions between transcription factors and their target genes. Results By computational searches and extensive manual curation, we have identified a compendium of 934 transcription factor genes (referred to as wTF2.0). We find that manual curation drastically reduces the number of both false positive and false negative transcription factor predictions. We discuss how transcription factor splice variants and dimer formation may affect the total number of functional transcription factors. In contrast to mouse transcription factor genes, we find that C. elegans transcription factor genes do not undergo significantly more splicing than other genes. This difference may contribute to differences in organism complexity. We identify candidate redundant worm transcription factor genes and orthologous worm and human transcription factor pairs. Finally, we discuss how wTF2.0 can be used together with physical transcription factor clone resources to facilitate the systematic mapping of C. elegans transcription regulatory networks. Conclusion wTF2.0 provides a starting point to decipher the transcription regulatory networks that control metazoan development and function
    corecore