509 research outputs found

    Comprehensive prediction in 78 human cell lines reveals rigidity and compactness of transcription factor dimers

    Get PDF
    The binding of transcription factors (TFs) to their specific motifs in genomic regulatory regions is commonly studied in isolation. However, in order to elucidate the mechanisms of transcriptional regulation, it is essential to determine which TFs bind DNA cooperatively as dimers and to infer the precise nature of these interactions. So far, only a small number of such dimeric complexes are known. Here, we present an algorithm for predicting cell-type-specific TF-TF dimerization on DNA on a large scale, using DNase I hypersensitivity data from 78 human cell lines. We represented the universe of possible TF complexes by their corresponding motif complexes, and analyzed their occurrence at cell-type-specific DNase I hypersensitive sites. Based on ~1.4 billion tests for motif complex enrichment, we predicted 603 highly significant celltype- specific TF dimers, the vast majority of which are novel. Our predictions included 76% (19/25) of the known dimeric complexes and showed significant overlap with an e xperimental database of protein-protein interactions. They were also independently supported by evolutionary conservation, as well as quantitative variation in DNase I digestion patterns. Notably, the known and predicted TF dimers were almost always highly compact and rigidly spaced, suggesting that TFs dimerize in close proximity to their partners, which results in strict constraints on the structure of the DNA-bound complex. Overall, our results indicate that chromatin openness profiles are highly predictive of cell-type-specific TF-TF interactions. Moreover, cooperative TF dimerization seems to be a widespread phenomenon, with multiple TF complexes predicted in most cell types. © 2013, Published by Cold Spring Harbor Laboratory Press.Link_to_subscribed_fulltex

    Unraveling the transcriptional Cis-regulatory code

    Get PDF
    It is nowadays accepted that eukaryotic complexity is not dictated by the number of protein-coding genes of the genome, but rather achieved through the combinatorics of gene expression programs. Distinct aspects of the expression pattern of a gene are mediated by discrete regulatory sequences, known as cis-regulatory elements. The work described in this thesis was aimed at developing computational and statistical methods to guide the search and characterization of novel cis-regulatory elements

    Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

    Get PDF
    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Nucleosome positioning on the chicken β-globin genes

    Get PDF

    Investigation on the structure of CpG methylated DNA

    Get PDF

    The Chromosomal High-Affinity Binding Sites for the Drosophila Dosage Compensation Complex

    Get PDF
    Dosage compensation in male Drosophila relies on the X chromosome–specific recruitment of a chromatin-modifying machinery, the dosage compensation complex (DCC). The principles that assure selective targeting of the DCC are unknown. According to a prevalent model, X chromosome targeting is initiated by recruitment of the DCC core components, MSL1 and MSL2, to a limited number of so-called “high-affinity sites” (HAS). Only very few such sites are known at the DNA sequence level, which has precluded the definition of DCC targeting principles. Combining RNA interference against DCC subunits, limited crosslinking, and chromatin immunoprecipitation coupled to probing high-resolution DNA microarrays, we identified a set of 131 HAS for MSL1 and MSL2 and confirmed their properties by various means. The HAS sites are distributed all over the X chromosome and are functionally important, since the extent of dosage compensation of a given gene and its proximity to a HAS are positively correlated. The sites are mainly located on non-coding parts of genes and predominantly map to regions that are devoid of nucleosomes. In contrast, the bulk of DCC binding is in coding regions and is marked by histone H3K36 methylation. Within the HAS, repetitive DNA sequences mainly based on GA and CA dinucleotides are enriched. Interestingly, DCC subcomplexes bind a small number of autosomal locations with similar features

    Predicting the Types of J-Proteins Using Clustered Amino Acids

    Get PDF

    Genome analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea

    Get PDF
    Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38–39 Mb genomes include 11,860–14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared t
    corecore