20 research outputs found

    Statistical significance of cis-regulatory modules

    Get PDF
    BACKGROUND: It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. RESULTS: We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. CONCLUSION: The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM) and MODSTORM software

    Computational Identification of Transcriptional Regulators in Human Endotoxemia

    Get PDF
    One of the great challenges in the post-genomic era is to decipher the underlying principles governing the dynamics of biological responses. As modulating gene expression levels is among the key regulatory responses of an organism to changes in its environment, identifying biologically relevant transcriptional regulators and their putative regulatory interactions with target genes is an essential step towards studying the complex dynamics of transcriptional regulation. We present an analysis that integrates various computational and biological aspects to explore the transcriptional regulation of systemic inflammatory responses through a human endotoxemia model. Given a high-dimensional transcriptional profiling dataset from human blood leukocytes, an elementary set of temporal dynamic responses which capture the essence of a pro-inflammatory phase, a counter-regulatory response and a dysregulation in leukocyte bioenergetics has been extracted. Upon identification of these expression patterns, fourteen inflammation-specific gene batteries that represent groups of hypothetically ‘coregulated’ genes are proposed. Subsequently, statistically significant cis-regulatory modules (CRMs) are identified and decomposed into a list of critical transcription factors (34) that are validated largely on primary literature. Finally, our analysis further allows for the construction of a dynamic representation of the temporal transcriptional regulatory program across the host, deciphering possible combinatorial interactions among factors under which they might be active. Although much remains to be explored, this study has computationally identified key transcription factors and proposed a putative time-dependent transcriptional regulatory program associated with critical transcriptional inflammatory responses. These results provide a solid foundation for future investigations to elucidate the underlying transcriptional regulatory mechanisms under the host inflammatory response. Also, the assumption that coexpressed genes that are functionally relevant are more likely to share some common transcriptional regulatory mechanism seems to be promising, making the proposed framework become essential in unravelling context-specific transcriptional regulatory interactions underlying diverse mammalian biological processes

    Bivalent-Like Chromatin Markers Are Predictive for Transcription Start Site Distribution in Human

    Get PDF
    Deep sequencing of 5′ capped transcripts has revealed a variety of transcription initiation patterns, from narrow, focused promoters to wide, broad promoters. Attempts have already been made to model empirically classified patterns, but virtually no quantitative models for transcription initiation have been reported. Even though both genetic and epigenetic elements have been associated with such patterns, the organization of regulatory elements is largely unknown. Here, linear regression models were derived from a pool of regulatory elements, including genomic DNA features, nucleosome organization, and histone modifications, to predict the distribution of transcription start sites (TSS). Importantly, models including both active and repressive histone modification markers, e.g. H3K4me3 and H4K20me1, were consistently found to be much more predictive than models with only single-type histone modification markers, indicating the possibility of “bivalent-like” epigenetic control of transcription initiation. The nucleosome positions are proposed to be coded in the active component of such bivalent-like histone modification markers. Finally, we demonstrated that models trained on one cell type could successfully predict TSS distribution in other cell types, suggesting that these models may have a broader application range

    Transcription factor site dependencies in human, mouse and rat genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is known that transcription factors frequently act together to regulate gene expression in eukaryotes. In this paper we describe a computational analysis of transcription factor site dependencies in human, mouse and rat genomes.</p> <p>Results</p> <p>Our approach for quantifying tendencies of transcription factor binding sites to co-occur is based on a binding site scoring function which incorporates dependencies between positions, the use of information about the structural class of each transcription factor (major/minor groove binder), and also considered the possible implications of varying GC content of the sequences. Significant tendencies (dependencies) have been detected by non-parametric statistical methodology (permutation tests). Evaluation of obtained results has been performed in several ways: reports from literature (many of the significant dependencies between transcription factors have previously been confirmed experimentally); dependencies between transcription factors are not biased due to similarities in their DNA-binding sites; the number of dependent transcription factors that belong to the same functional and structural class is significantly higher than would be expected by chance; supporting evidence from GO clustering of targeting genes. Based on dependencies between two transcription factor binding sites (second-order dependencies), it is possible to construct higher-order dependencies (networks). Moreover results about transcription factor binding sites dependencies can be used for prediction of groups of dependent transcription factors on a given promoter sequence. Our results, as well as a scanning tool for predicting groups of dependent transcription factors binding sites are available on the Internet.</p> <p>Conclusion</p> <p>We show that the computational analysis of transcription factor site dependencies is a valuable complement to experimental approaches for discovering transcription regulatory interactions and networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions.</p

    Interplay between the Chd4/NuRD Complex and the Transcription Factor Znf219 Controls Cardiac Cell Identity

    Get PDF
    The sarcomere regulates striated muscle contraction. This structure is composed of several myofibril proteins, isoforms of which are encoded by genes specific to either the heart or skeletal muscle. The chromatin remodeler complex Chd4/NuRD regulates the transcriptional expression of these specific sarcomeric programs by repressing genes of the skeletal muscle sarcomere in the heart. Aberrant expression of skeletal muscle genes induced by the loss of Chd4 in the heart leads to sudden death due to defects in cardiomyocyte contraction that progress to arrhythmia and fibrosis. Identifying the transcription factors (TFs) that recruit Chd4/NuRD to repress skeletal muscle genes in the myocardium will provide important information for understanding numerous cardiac pathologies and, ultimately, pinpointing new therapeutic targets for arrhythmias and cardiomyopathies. Here, we sought to find Chd4 interactors and their function in cardiac homeostasis. We therefore describe a physical interaction between Chd4 and the TF Znf219 in cardiac tissue. Znf219 represses the skeletal-muscle sarcomeric program in cardiomyocytes in vitro and in vivo, similarly to Chd4. Aberrant expression of skeletal-muscle sarcomere proteins in mouse hearts with knocked down Znf219 translates into arrhythmias, accompanied by an increase in PR interval. These data strongly suggest that the physical and genetic interaction of Znf219 and Chd4 in the mammalian heart regulates cardiomyocyte identity and myocardial contraction.J.V. was supported by the Spanish Ministry of Science and Innovation (PGC2018-097019-B-I00 and PID2021-122348NB-I00), UE Funds and Micinn-Inst Carlos III (PMP21_00057) and “la Caixa” Banking Foundation (project codes HR17-00247 and HR22-00253). J.M.R. was supported by the La Caixa Banking Foundation (project code HR18-00068), the Spanish Ministry of Science and Innovation grant RTI2018-099246-B-I00 (MICIU/AEI/FEDER, UE); the Comunidad de Madrid and European Social Fund (ESF) grant AORTASANA-CM (B2017/BMD-3676); and the Instituto de Salud Carlos III (ISCIII) (CIBER-CVCB16/11/00264). PG-A was supported by Spanish Ministry of Science and Innovation (grants SAF2016-77816-P and PID2020-114773GB-I00). The CNIC is supported by Instituto de Salud Carlos III (ISCIII), the Spanish Ministry of Science and Innovation and the Pro CNIC Foundation and is a Severo Ochoa Center of Excellence (grant CEX2020-001041-S funded by Spanish Ministry of Science and Innovation AEI/10.13039/501100011033). FAS is supported by a Science and Innovation Fellowship (BES-2017-080629).S

    Gene set-based module discovery in the breast cancer transcriptome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data.</p> <p>Results</p> <p>In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on <it>cis</it>-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2) is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells.</p> <p>Conclusion</p> <p>These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.</p

    Tanimoto's best barbecue: discovering regulatory modules using tanimoto scores

    Get PDF
    We present a combinatorial method for discovering cis-regulatory modules in promoter sequences. Our approach combines “sliding window” approaches with a scoring function based on the so-called Tanimoto score. This allows to identify sets of binding sites that tend to occur preferentially in the vicinity of each other in a given set of promoter sequences belonging to co-expressed or orthologous genes. We benchmark our method on a data set derived from muscle-specific genes, demonstrating that our approach is capable of identifying modules that were identified as functional in previous studies

    Regulatory modules controlling maize inflorescence architecture

    Get PDF
    Genetic control of branching is a primary determinant of yield, regulating seed number and harvesting ability, yet little is known about the molecular networks that shape grain-bearing inflorescences of cereal crops. Here, we used the maize (Zea mays) inflorescence to investigate gene networks that modulate determinacy, specifically the decision to allow branch growth. We characterized developmental transitions by associating spatiotemporal expression profiles with morphological changes resulting from genetic perturbations that disrupt steps in a pathway controlling branching. Developmental dynamics of genes targeted in vivo by the transcription factor RAMOSA1, a key regulator of determinacy, revealed potential mechanisms for repressing branches in distinct stem cell populations, including interactions with KNOTTED1, a master regulator of stem cell maintenance. Our results uncover discrete developmental modules that function in determining grass-specific morphology and provide a basis for targeted crop improvement and translation to other cereal crops with comparable inflorescence architectures
    corecore