62 research outputs found

    Homotypic clusters of transcription factor binding sites: A model system for understanding the physical mechanics of gene expression.

    Get PDF
    The organization of binding sites in cis-regulatory elements (CREs) can influence gene expression through a combination of physical mechanisms, ranging from direct interactions between TF molecules to DNA looping and transient chromatin interactions. The study of simple and common building blocks in promoters and other CREs allows us to dissect how all of these mechanisms work together. Many adjacent TF binding sites for the same TF species form homotypic clusters, and these CRE architecture building blocks serve as a prime candidate for understanding interacting transcriptional mechanisms. Homotypic clusters are prevalent in both bacterial and eukaryotic genomes, and are present in both promoters as well as more distal enhancer/silencer elements. Here, we review previous theoretical and experimental studies that show how the complexity (number of binding sites) and spatial organization (distance between sites and overall distance from transcription start sites) of homotypic clusters influence gene expression. In particular, we describe how homotypic clusters modulate the temporal dynamics of TF binding, a mechanism that can affect gene expression, but which has not yet been sufficiently characterized. We propose further experiments on homotypic clusters that would be useful in developing mechanistic models of gene expression.This is the published version of the manuscript. It was first published by Elsevier in Computational and Structural Biotechnology Journal here: http://www.sciencedirect.com/science/article/pii/S2001037014000142

    Reliable scaling of position weight matrices for binding strength comparisons between transcription factors.

    Get PDF
    BACKGROUND: Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748-7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities. RESULTS: Here, we provide two different ways to find the scaling parameter λ that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate λ for a specific transcription factor, which we applied to show that λ distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert λ between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches. CONCLUSION: These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases

    Determining Physical Mechanisms of Gene Expression Regulation from Single Cell Gene Expression Data.

    Get PDF
    Many genes are expressed in bursts, which can contribute to cell-to-cell heterogeneity. It is now possible to measure this heterogeneity with high throughput single cell gene expression assays (single cell qPCR and RNA-seq). These experimental approaches generate gene expression distributions which can be used to estimate the kinetic parameters of gene expression bursting, namely the rate that genes turn on, the rate that genes turn off, and the rate of transcription. We construct a complete pipeline for the analysis of single cell qPCR data that uses the mathematics behind bursty expression to develop more accurate and robust algorithms for analyzing the origin of heterogeneity in experimental samples, specifically an algorithm for clustering cells by their bursting behavior (Simulated Annealing for Bursty Expression Clustering, SABEC) and a statistical tool for comparing the kinetic parameters of bursty expression across populations of cells (Estimation of Parameter changes in Kinetics, EPiK). We applied these methods to hematopoiesis, including a new single cell dataset in which transcription factors (TFs) involved in the earliest branchpoint of blood differentiation were individually up- and down-regulated. We could identify two unique sub-populations within a seemingly homogenous group of hematopoietic stem cells. In addition, we could predict regulatory mechanisms controlling the expression levels of eighteen key hematopoietic transcription factors throughout differentiation. Detailed information about gene regulatory mechanisms can therefore be obtained simply from high throughput single cell gene expression data, which should be widely applicable given the rapid expansion of single cell genomics.This work was supported by: Royal Society Research Fellowship, Marshall Scholarship, Medical Research Council, the Leukemia and Lymphoma Society and core support grants from the Wellcome Trust to the Cambridge Institute for Medical Research and the Wellcome Trust and MRC Cambridge Stem Cell Institute.This is the final version of the article. It first appeared from the Public Library of Science via http://dx.doi.org/10.1371/journal.pcbi.100507

    Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors.

    Get PDF
    BACKGROUND: Transcription factor (TF) binding to regulatory DNA sites is a key determinant of cell identity within multi-cellular organisms and has been studied extensively in relation to site affinity and chromatin modifications. There has been a strong focus on the inference of TF-gene regulatory networks and TF-TF physical interaction networks. Here, we present a third type of TF network, the spatial network of co-localized TF binding sites within the three-dimensional genome. RESULTS: Using published canonical Hi-C data and single-cell genome structures, we assess the spatial proximity of a genome-wide array of potential TF-TF co-localizations in human and mouse cell lines. For individual TFs, the abundance of occupied binding sites shows a positive correspondence with their clustering in three dimensions, and this is especially apparent for weak TF binding sites and at enhancer regions. An analysis between different TF proteins identifies significantly proximal pairs, which are enriched in reported physical interactions. Furthermore, clustering of different TFs based on proximity enrichment identifies two partially segregated co-localization sub-networks, involving different TFs in different cell types. Using data from both human lymphoblastoid cells and mouse embryonic stem cells, we find that these sub-networks are enriched within, but not exclusive to, different chromosome sub-compartments that have been identified previously in Hi-C data. CONCLUSIONS: This suggests that the association of TFs within spatial networks is closely coupled to gene regulatory networks. This applies to both differentiated and undifferentiated cells and is a potential causal link between lineage-specific TF binding and chromosome sub-compartment segregation

    Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores

    Get PDF
    Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study

    How young offenders interpret, construct and engage in education from statutory school to post-16 provision

    Get PDF
    Figure S1. The relationship between maximum PWM score and information content of PWMs. Individual dots represents each PWM generated from the non-redundant PFM JASPAR-CORE database [5] after the filtering procedures specified in the Methods section. There is a strong positive correlation between the information content of the PWM and the maximum possible PWM score that could be generated by that PWM, with an adjusted R 2 value of 0.597. (PDF 25.9KB

    Phytochromes function as thermosensors in Arabidopsis

    Get PDF
    Plants are responsive to temperature, and can distinguish differences of 1ÂşC. In Arabidopsis, warmer temperature accelerates flowering and increases elongation growth hermomorphogenesis). The mechanisms of temperature perception are however largely unknown. We describe a major thermosensory role for the phytochromes (red light receptors) during the night. Phytochrome null plants display a constitutive warm temperature response, and consistent with this, we show in this background that the warm temperature transcriptome becomes de-repressed at low temperatures. We have discovered phytochrome B (phyB) directly associates with the promoters of key target genes in a temperature dependent manner. The rate of phyB inactivation is proportional to temperature in the dark, enabling phytochromes to function as thermal timers, integrating temperature information over the course of the night

    The Evening Complex establishes repressive chromatin domains via H2A.Z deposition

    Get PDF
    The Evening Complex (EC) is a core component of the Arabidopsis (Arabidopsis thaliana) circadian clock, which represses target gene expression at the end of the day and integrates temperature information to coordinate environmental and endogenous signals. Here we show that the EC induces repressive chromatin structure to regulate the evening transcriptome. The EC component ELF3 directly interacts with a protein from the SWI2/SNF2-RELATED (SWR1) complex to control deposition of H2A.Z-nucleosomes at the EC target genes. SWR1 components display circadian oscillation in gene expression with a peak at dusk. In turn, SWR1 is required for the circadian clockwork, as defects in SWR1 activity alter morning55 expressed genes. The EC-SWR1 complex binds to the loci of the core clock genes PSEUDO56 RESPONSE REGULATOR7 (PRR7) and PRR9 and catalyzes deposition of nucleosomes containing the histone variant H2A.Z coincident with the repression of these genes at dusk. This provides a mechanism by which the circadian clock temporally establishes repressive chromatin domains to shape oscillatory gene expression around dusk

    What is quantitative plant biology?

    Get PDF
    Quantitative plant biology is an interdisciplinary field that builds on a long history of biomathematics and biophysics. Today, thanks to high spatiotemporal resolution tools and computational modelling, it sets a new standard in plant science. Acquired data, whether molecular, geometric or mechanical, are quantified, statistically assessed and integrated at multiple scales and across fields. They feed testable predictions that, in turn, guide further experimental tests. Quantitative features such as variability, noise, robustness, delays or feedback loops are included to account for the inner dynamics of plants and their interactions with the environment. Here, we present the main features of this ongoing revolution, through new questions around signalling networks, tissue topology, shape plasticity, biomechanics, bioenergetics, ecology and engineering. In the end, quantitative plant biology allows us to question and better understand our interactions with plants. In turn, this field opens the door to transdisciplinary projects with the society, notably through citizen science.Peer reviewe
    • …
    corecore