541 research outputs found

    Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.

    Get PDF
    Empirical evidence suggests that the malaria parasite Plasmodium falciparum employs a broad range of mechanisms to regulate gene transcription throughout the organism's complex life cycle. To better understand this regulatory machinery, we assembled a rich collection of genomic and epigenomic data sets, including information about transcription factor (TF) binding motifs, patterns of covalent histone modifications, nucleosome occupancy, GC content, and global 3D genome architecture. We used these data to train machine learning models to discriminate between high-expression and low-expression genes, focusing on three distinct stages of the red blood cell phase of the Plasmodium life cycle. Our results highlight the importance of histone modifications and 3D chromatin architecture in Plasmodium transcriptional regulation and suggest that AP2 transcription factors may play a limited regulatory role, perhaps operating in conjunction with epigenetic factors

    Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion

    Get PDF
    The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a null model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human

    Analysing and quantitatively modelling nucleosome binding preferences

    Get PDF
    The main emphasis of my work as a PhD student was the analysis and prediction of nucleosome positioning, focusing on the role sequence features play. Part I gives a broad overview of nucleosomes, before defining important technical terms. It continues by describing and reviewing experiments that measure nucleosome positioning and bioinformatic methods that learn the sequence preferences of nucleosomes to predict their positioning. Part II describes a collaboration project with the Gaul-lab, where I analyzed MNase-Seq measurements of nucleosomes in Drosophila. The original intention was to investigate the extent to which experimental biases influence the measurements. We extended the analysis to categorize and explore fragile, average and resistant nucleosome populations. I focused on the relation between nucleosome fragility and the sequence landscape, especially at promoters and enhancers. Analyzing the partial unwrapping of nucleosomes genome-wide, I found that the G+C ratio is a determinant of asymmetric unwrapping. I excluded an analysis of histone modifications from this work, which was part of this collaboration, due to its low relevance to the rest of the presented work. Part III describes my main project of developing a probabilistic nucleosome-position prediction method. I developed a maximum likelihood approach to learn a biophysical model of nucleosome binding. By including the low positional resolution of MNase-Seq and the sequence bias of CC-Seq into the likelihood, I could separate them from the nucleosome binding preferences and learn highly correlated nucleosome binding energy models. My analysis shows that nucleosomes have a position-specific binding preference and might be uninfluenced by G+C content or even disfavor it – contrary to the Consensus in literature. Part IV describes further analysis I did during my time as a PhD student that are not part of any planned publications. The main topics are: ancillary elements of my main project, unsuccessful attempts to correct experimental biases, analysis of the quality of experimental measurements, and adapting my probabilistic nucleosome-position prediction method to work with occupancy measurements. Lastly, I give a general outlook that reflects on my results and discusses next steps, like ways to improve my method further. I excluded two collaboration projects I participated in from this thesis, because they are still ongoing: a systematic analysis of how the core promoter sequence influences gene expression in Drosophila and the development of an experiment to measure nucleosome occupancy more precisely

    Chromatin digestion by the chemotherapeutic agent Bleomycin produces nucleosome and Transcription Factor footprinting patterns similar to Micrococcal Nuclease

    Get PDF
    Bleomycin (BLM), a glycopeptide antibiotic commonly used in chemotherapeutic treatments, has been shown to produce single and double stranded DNA breaks. Subsequent analysis of DNA fragmentation patterns has demonstrated preferential digestion of chromatin in the TSS of active genes and the ability to produce nucleosome-sized fragments within intact chromatin. Nucleosome positioning plays a critical role in the regulation of gene activation. Currently, micrococcal nuclease (MNase) is used as the standard for mapping the position of nucleosomes in the genome. In order to identify whether BLM can be used as an effective nucleosome-mapping agent, BLM was used to digest chromatin in S. cerevisiae, followed by Next Generation Sequencing of paired-end DNA fragments. Our results demonstrate comparable DNA fragmentation patterns for both nucleosomes as well as other DNA-protein interactions and furthermore explain the propensity for BLM to digest within the promoters of active genes. Finally, we show that BLM can be used to identify genome-wide nucleosome and Transcription factor footprints as an effective alternative for MNase and additionally lacks the strong sequence biases of MNase digestion

    Transcription factor DNA binding- and nucleosome formation energies determined by high performance fluorescence anisotropy

    Get PDF
    Protein DNA binding is the core of transcriptional regulation, the process which controls the flow of information stored in an organism’s genome to react to its environment and to maintain its functionality. The initial event of gene expression is the binding of a transcription factor (TF) to its target site. These binding events are integrated over several binding sites and TFs by which a fine tuned regulation can be achieved. The number, combination and strengths of the different binding sites encode the desired gene expression level and the plasticity of the regulated gene. Efforts have been devoted with the goal of identifying the specific DNA sequences bound by different TFs. For more than two decades, it was thought that mutations at each position in this sequence independently contribute to the binding probability of a TF. This binding preference has therefore been described through position weight matrices (PWMs). PWMs describe the binding preference of a TF towards its target sites by assuming that each nucleotide position contributes independently to the total specificity (linearity assumption). However, current research has shown that this simplified view lacks a significant part of the information needed to precisely describe the binding preference of a TF. It was also shown that the most information missing in the PWM is encoded in dinucleotide mutations. Two questions are important in this regard: (1) Which information about TF-DNA interaction are we missing and are currently employed methods able to provide them? and (2) What is a comprehensive description of non-linearity that is based on biophysical properties rather then on abstract probabilities? One important aspect is the three dimensional configuration of the DNA strand (DNA shape) which is known to affect TF binding to a varying degree. Through recent work by the group of Remo Rohs it is possible to predict shape parameters (features) from a DNA sequence and investigate to which degree they influence binding for any given set of measurements. The first aim of this thesis is therefore to determine non-linearity in TF-DNA interaction and investigate the influence of DNA shape on them. Protein-DNA interactions were studied with a variety of methods using structural biology (NMR, crystallography, cryo EM) or quantitative Methods (EMSA, DNA binding arrays, ChIPSeq, B1H, SELEX, MITOMI, Simile-Seq). Most of these quantitative methods to measure TF-DNA interactions, however, are not very sensitive to weak binders due to stringent washing steps or cutoffs they employ. Especially sequences with two positions differing from the consensus can be very weakly bound - therefore a sensitive method is needed to investigate non-linearity. The method called High Performance Fluorescence Anisotropy (HiP-FA, recently developed in our lab) provides the necessary sensitivity. Using HiP-FA, I determined the affinities of 13 TFs from the Drosophila melanogaster segmentation network and found most of them to contain a significant non-linearity in their specificity. The binding energies of the TFs correlated significantly with certain DNA shape features suggesting shape readout by the TFs. These results could be confirmed in existing structural biology data. Besides the influence of information directly encoded in the DNA sequence, the binding of a TF in the genome is most influenced by the DNA accessibility. This property is a result of the genomic DNA being wrapped around histone octamers forming nucleosomes. Since the underlying sequence can also influence the binding of the histone complex to the DNA, a natural question to ask is which features of the DNA sequence are the major determinant of histone-DNA interaction. Attempts to address this question used existing methods which were either MNase based and are therefore prone to the enzymes intrinsic cutting bias or based on dialysis and/or EMSA readout and have in consequence a low throughput and can only be automated to a small degree. This leads to a limited set of measurements which are usually only based on a single measurement point instead of a complete titration curve. The second aim of my thesis is therefore to develop an in vitro assay to determine free energies of nucleosome formation which improves on the limitations of existing methods. Using the sensitive FA-microscopy setup, I developed an automated assay to determine the free energy of nucleosome formation in a competitive titration. In contrast to existing methods, the throughput of the assays allows for full competitor titration curves. By measuring the free binding energies of 42 sequences, I showed that GC-content is the factor most contributing to the free energy. The relationship between these quantities is non-monotonous with an optimal GC-content of 49 percent. The results provided in this thesis give insight into the nature of non-linearity in TF-DNA interactions and highlight the DNA shape readout therein. Methodical advancements developed in this work can be used as a foundation to investigate other kinds of molecular interactions making use of the high sensitivity of FA-based microscopy

    Nucleosome occupancy reveals regulatory elements of the CFTR promoter

    Get PDF
    Access to regulatory elements of the genome can be inhibited by nucleosome core particles arranged along the DNA strand. Hence, sites that are accessible by transcription factors may be located by using nuclease digestion to identify the relative nucleosome occupancy of a genomic region. In order to define novel cis regulatory elements in the ∼2.7-kb promoter region of the cystic fibrosis transmembrane conductance regulator (CFTR) gene, we define its nucleosome occupancy. This profile reveals the precise positions of nucleosome-free regions (NFRs), both cell-type specific and others apparently unrelated to CFTR-expression level and offer the first high-resolution map of the chromatin structure of the entire CFTR promoter in relevant cell types. Several of these NFRs are strongly bound by nuclear factors in a sequence-specific manner, and directly influence CFTR promoter activity. Sequences within the NFR1 and NFR4 elements are highly conserved in many human gene promoters. Moreover, NFR1 contributes to promoter activity of another gene, angiopoietin-like 3 (ANGPTL3), while NFR4 is constitutively nucleosome-free in promoters genome wide. Conserved motifs within NFRs of the CFTR promoter also show a high level of protection from DNase I digestion genome-wide, and likely have important roles in the positioning of nucleosome core particles more generally

    Modeling nucleosome mediated mechanisms of gene regulation

    Get PDF
    The genomes of all eukaryotic organisms are packaged into nucleosomes, which are the fundamental units of chromatin, each composed of approximately 147 base pairs of DNA wrapped around a histone octamer. Because 70-90% of the eukaryotic genome is packaged into nucleosomes they modulate accessibility of DNA to transcription factors (TFs) and play an important role in regulation of transcription. This thesis is devoted to the mathematical modeling of effects which are caused by direct competition between nucleosomes and transcription factors. The contents of the thesis are organized as follows: in chapter 1 we introduce experimental methods and recent discoveries which have been made in chromatin biology. In chapter 2 we introduce a thermodynamic biophysical model for calculating nucleosome and transcription factor occupancies. We also introduce the statistical positioning effect and how it may affect the binding of transcription factors. In chapter 2 we mostly address a question of how competition with transcription factors can affect nucleosome positioning. We first examine nucleosome experimental data and address the question of reproducibility of the data across different experiments carried out in several labs. Then, we introduce a new method for the quality assessment of the prediction of the model and use it to optimize parameters of the model to fit experimental data. We focus on how transcription factors can explain observed in vivo nucleosome positioning and which transcription factors play crucial roles in establishing nucleosome patterns at the promoters of genes. In chapter 3 we address a question of how nucleosomes and promoter architecture affect binding of TFs. We model binding of TFs in the context of chromatin to a cluster of binding sites and investigate what features of the binding site cluster determine the main characteristics of TF binding. Finally, we study how TFBSs in real genomes are positioned relative to each other and show that there are certain biases in spacing between TFBSs, probably due to effects caused by competition with nucleosomes
    corecore