470 research outputs found

    An improved predictive recognition model for Cys2-His2 zinc finger proteins

    Get PDF
    Cys2-His2 zinc finger proteins (ZFPs) are the largest family of transcription factors in higher metazoans. They also represent the most diverse family with regards to the composition of their recognition sequences. Although there are a number of ZFPs with characterized DNA-binding preferences, the specificity of the vast majority of ZFPs is unknown and cannot be directly inferred by homology due to the diversity of recognition residues present within individual fingers. Given the large number of unique zinc fingers and assemblies present across eukaryotes, a comprehensive predictive recognition model that could accurately estimate the DNA-binding specificity of any ZFP based on its amino acid sequence would have great utility. Toward this goal, we have used the DNA-binding specificities of 678 two-finger modules from both natural and artificial sources to construct a random forest-based predictive model for ZFP recognition. We find that our recognition model outperforms previously described determinant-based recognition models for ZFPs, and can successfully estimate the specificity of naturally occurring ZFPs with previously defined specificities

    Next-generation sequencing of common osteogenesis imperfecta-related genes in clinical practice

    Get PDF
    Next generation sequencing (NGS) is a rapidly developing area in genetics. Utilizing this technology in the management of disorders with complex genetic background and not recurrent mutation hot spots can be extremely useful. In this study, we applied NGS, namely semiconductor sequencing to determine the most significant osteogenesis imperfecta-related genetic variants in the clinical practice. We selected genes coding collagen type I alpha-1 and-2 (COL1A1, COL1A2) which are responsible for more than 90% of all cases. CRTAP and LEPRE1/P3H1 genes involved in the background of the recessive forms with relatively high frequency (type VII and VIII) represent less than 10% of the disease. In our six patients (1-41 years), we identified 23 different variants. We found a total of 14 single nucleotide variants (SNV) in COL1A1 and COL1A2, 5 in CRTAP and 4 in LEPRE1. Two novel and two already well-established pathogenic SNVs have been identified. Among the newly recognized mutations, one results in an amino acid change and one of them is a stop codon. We have shown that a new full-scale cost-effective NGS method can be developed and utilized to supplement diagnostic process of osteogenesis imperfecta with molecular genetic data in clinical practice

    Recognition models to predict DNA-binding specificities of homeodomain proteins

    Get PDF
    Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes

    Variants in a Novel Epidermal Collagen Gene (COL29A1) Are Associated with Atopic Dermatitis

    Get PDF
    Atopic dermatitis (AD) is a common chronic inflammatory skin disorder and a major manifestation of allergic disease. AD typically presents in early childhood often preceding the onset of an allergic airway disease, such as asthma or hay fever. We previously mapped a susceptibility locus for AD on Chromosome 3q21. To identify the underlying disease gene, we used a dense map of microsatellite markers and single nucleotide polymorphisms, and we detected association with AD. In concordance with the linkage results, we found a maternal transmission pattern. Furthermore, we demonstrated that the same families contribute to linkage and association. We replicated the association and the maternal effect in a large independent family cohort. A common haplotype showed strong association with AD (p = 0.000059). The associated region contained a single gene, COL29A1, which encodes a novel epidermal collagen. COL29A1 shows a specific gene expression pattern with the highest transcript levels in skin, lung, and the gastrointestinal tract, which are the major sites of allergic disease manifestation. Lack of COL29A1 expression in the outer epidermis of AD patients points to a role of collagen XXIX in epidermal integrity and function, the breakdown of which is a clinical hallmark of AD

    Using protein design algorithms to understand the molecular basis of disease caused by protein–DNA interactions: the Pax6 example

    Get PDF
    Quite often a single or a combination of protein mutations is linked to specific diseases. However, distinguishing from sequence information which mutations have real effects in the protein’s function is not trivial. Protein design tools are commonly used to explain mutations that affect protein stability, or protein–protein interaction, but not for mutations that could affect protein–DNA binding. Here, we used the protein design algorithm FoldX to model all known missense mutations in the paired box domain of Pax6, a highly conserved transcription factor involved in eye development and in several diseases such as aniridia. The validity of FoldX to deal with protein–DNA interactions was demonstrated by showing that high levels of accuracy can be achieved for mutations affecting these interactions. Also we showed that protein-design algorithms can accurately reproduce experimental DNA-binding logos. We conclude that 88% of the Pax6 mutations can be linked to changes in intrinsic stability (77%) and/or to its capabilities to bind DNA (30%). Our study emphasizes the importance of structure-based analysis to understand the molecular basis of diseases and shows that protein–DNA interactions can be analyzed to the same level of accuracy as protein stability, or protein–protein interactions

    Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers

    Get PDF
    Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences

    ZFNGenome: A comprehensive resource for locating zinc finger nuclease target sites in model organisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Zinc Finger Nucleases (ZFNs) have tremendous potential as tools to facilitate genomic modifications, such as precise gene knockouts or gene replacements by homologous recombination. ZFNs can be used to advance both basic research and clinical applications, including gene therapy. Recently, the ability to engineer ZFNs that target any desired genomic DNA sequence with high fidelity has improved significantly with the introduction of rapid, robust, and publicly available techniques for ZFN design such as the Oligomerized Pool ENgineering (OPEN) method. The motivation for this study is to make resources for genome modifications using OPEN-generated ZFNs more accessible to researchers by creating a user-friendly interface that identifies and provides quality scores for all potential ZFN target sites in the complete genomes of several model organisms.</p> <p>Description</p> <p>ZFNGenome is a GBrowse-based tool for identifying and visualizing potential target sites for OPEN-generated ZFNs. ZFNGenome currently includes a total of more than 11.6 million potential ZFN target sites, mapped within the fully sequenced genomes of seven model organisms; <it>S. cerevisiae, C. reinhardtii, A. thaliana</it>, <it>D. melanogaster, D. rerio, C. elegans</it>, and <it>H. sapiens </it>and can be visualized within the flexible GBrowse environment. Additional model organisms will be included in future updates. ZFNGenome provides information about each potential ZFN target site, including its chromosomal location and position relative to transcription initiation site(s). Users can query ZFNGenome using several different criteria (e.g., gene ID, transcript ID, target site sequence). Tracks in ZFNGenome also provide "uniqueness" and ZiFOpT (Zinc Finger OPEN Targeter) "confidence" scores that estimate the likelihood that a chosen ZFN target site will function <it>in vivo</it>. ZFNGenome is dynamically linked to ZiFDB, allowing users access to all available information about zinc finger reagents, such as the effectiveness of a given ZFN in creating double-stranded breaks.</p> <p>Conclusions</p> <p>ZFNGenome provides a user-friendly interface that allows researchers to access resources and information regarding genomic target sites for engineered ZFNs in seven model organisms. This genome-wide database of potential ZFN target sites should greatly facilitate the utilization of ZFNs in both basic and clinical research.</p> <p>ZFNGenome is freely available at: <url>http://bindr.gdcb.iastate.edu/ZFNGenome</url> or at the Zinc Finger Consortium website: <url>http://www.zincfingers.org/</url>.</p
    corecore