887 research outputs found

    Studies on genetic and epigenetic regulation of gene expression dynamics

    Get PDF
    The information required to build an organism is contained in its genome and the first biochemical process that activates the genetic information stored in DNA is transcription. Cell type specific gene expression shapes cellular functional diversity and dysregulation of transcription is a central tenet of human disease. Therefore, understanding transcriptional regulation is central to understanding biology in health and disease. Transcription is a dynamic process, occurring in discrete bursts of activity that can be characterized by two kinetic parameters; burst frequency describing how often genes burst and burst size describing how many transcripts are generated in each burst. Genes are under strict regulatory control by distinct sequences in the genome as well as epigenetic modifications. To properly study how genetic and epigenetic factors affect transcription, it needs to be treated as the dynamic cellular process it is. In this thesis, I present the development of methods that allow identification of newly induced gene expression over short timescales, as well as inference of kinetic parameters describing how frequently genes burst and how many transcripts each burst give rise to. The work is presented through four papers: In paper I, I describe the development of a novel method for profiling newly transcribed RNA molecules. We use this method to show that therapeutic compounds affecting different epigenetic enzymes elicit distinct, compound specific responses mediated by different sets of transcription factors already after one hour of treatment that can only be detected when measuring newly transcribed RNA. The goal of paper II is to determine how genetic variation shapes transcriptional bursting. To this end, we infer transcriptome-wide burst kinetics parameters from genetically distinct donors and find variation that selectively affects burst sizes and frequencies. Paper III describes a method for inferring transcriptional kinetics transcriptome-wide using single-cell RNA-sequencing. We use this method to describe how the regulation of transcriptional bursting is encoded in the genome. Our findings show that gene specific burst sizes are dependent on core promoter architecture and that enhancers affect burst frequencies. Furthermore, cell type specific differential gene expression is regulated by cell type specific burst frequencies. Lastly, Paper IV shows how transcription shapes cell types. We collect data on cellular morphologies, electrophysiological characteristics, and measure gene expression in the same neurons collected from the mouse motor cortex. Our findings show that cells belonging to the same, distinct transcriptomic families have distinct and non-overlapping morpho-electric characteristics. Within families, there is continuous and correlated variation in all modalities, challenging the notion of cell types as discrete entities

    The Evolution and Mechanics of Translational Control in Plants

    Get PDF
    The expression of numerous plant mRNAs is attenuated by RNA sequence elements located in the 5\u27 and 3\u27 untranslated regions (UTRs). For example, in plants and many higher eukaryotes, roughly 35% of genes encode mRNAs that contain one or more upstream open reading frames (uORFs) in the 5\u27 UTR. For this dissertation I have analyzed the pattern of conservation of such mRNA sequence elements. In the first set of studies, I have taken a comparative transcriptomics approach to address which RNA sequence elements are conserved between various families of angiosperm plants. Such conservation indicates an element\u27s fundamental importance to plant biology, points to pathways for which it is most vital, and suggests the mechanism by which it acts. Conserved motifs were detected in 3% of genes. These include di-purine repeat motifs, uORF-associated motifs, putative binding sites for PUMILIO-like RNA binding proteins, small RNA targets, and a wide range of other sequence motifs. Due to the scanning process that precedes translation initiation, uORFs are often translated, thereby repressing initiation at the an mRNA\u27s main ORF. As one might predict, I found a clear bias against the AUG start codon within the 5\u27 untranslated region (5\u27 UTR) among all plants examined. Further supporting this finding, comparative analysis indicates that, for ~42% of genes, AUGs and their resultant uORFs reduce carrier fitness. Interestingly, for at least 5% of genes, uORFs are not only tolerated, but enriched. The remaining uORFs appear to be neutral. Because of their tangible impact on plant biology, it is critical to differentiate how uORFs affect translation and how, in many cases, their inhibitory effects are neutralized. In pursuit of this aim, I developed a computational model of the initiation process that uses five parameters to account for uORF presence. In vivo translation efficiency data from uORF-containing reporter constructs were used to estimate the model\u27s parameters in wild type Arabidopsis. In addition, the model was applied to identify salient defects associated with a mutation in the subunit h of eukaryotic initiation factor 3 (eIF3h). The model indicates that eIF3h, by supporting re-initation during uORF elongation, facilitates uORF tolerance

    Evolution of regulatory complexes: a many-body system

    Get PDF
    The recent advent of large-scale genomic sequence data and improvement of sequencing technologies has enabled population genetics to advance from a mostly abstract theoretical basis to a quantitative molecular description. However, functional units in DNA are typically combinations of interacting nucleotide segments, and evolutionary forces acting on these segments can result in very complicated population dynamics. The goal is to formulate these interactions in such a way that the macroscopic features are independent of the microscopic details, as in statistical mechanics. In this thesis, I discuss the evolutionary dynamics of regulatory sequences, which control the production of protein in cells. One of the primary forms of regulation occurs through interactions of proteins called transcription factors, with binding sites in the DNA sequence, and the strength of these interactions influence the individual's fitness in the population. What makes this an ideal model system for quantitative analysis of genomic evolution, is the possibility of inferring this relationship. Compared to prokaryotes and yeast, gene regulation is much more complex in higher eukaryotes. Regulatory information is organized in modules with multiple binding sites that are linked to a common function. In Chapter. 2, we show that binding site complexes are commonly formed by local sequence duplications, as opposed to forming from scratch by single point mutations. We also show that the underlying regulatory grammar is in tune with this mechanism such that the duplication events confer an adaptive advantage. Regulatory complexes resemble a many-particle system whose function emerges from the collective dynamics of its elements. In Chapter. 3, we develop a thermodynamic framework to characterize the effective affinity of site complexes to multiple transcription factors with cooperative binding. These affinities are the phenotype, or trait of binding complexes on which selection acts, and we characterize their evolution. From the yeast genome polymorphism data, we infer a fitness landscape as a function of binding affinity by using the novel method developed in Chapter.~ 4. This method of quantitative trait analysis can deal with long-range correlations between sites which arise in asexual populations. Our fitness landscape quantitatively predicts the amount of conservation of the phenotype, as well as the amount of compensatory changes between sites. Our results open a new avenue to understand the regulatory "grammar" of eukaryotic genomes based on quantitative evolution models. They prove that a combination of theoretical models, high-throughput experimental measurements, and analysis of genomic variation is necessary for a proper quantitative understanding of biological systems

    Mapping and Functional Analysis of cis-Regulatory Elements in Mouse Photoreceptors

    Get PDF
    Photoreceptors are light-sensitive neurons that mediate vision, and they are the most commonly affected cell type in genetic forms of blindness. In mice, there are two basic types of photoreceptors, rods and cones, which mediate vision in dim and bright environments, respectively. The transcription factors (TFs) that control rod and cone development have been studied in detail, but the cis-regulatory elements (CREs) through which these TFs act are less well understood. To comprehensively identify photoreceptor CREs in mice and to understand their relationship with gene expression, we performed open chromatin (ATAC-seq) and transcriptome (RNA-seq) profiling of FACS-purified rods and cones. We find that rods have significantly fewer regions of open chromatin than cones (as well as \u3e60 additional cell types and tissues), and we demonstrate that this uniquely closed chromatin architecture depends on the rod master regulator Nrl. Finally, we find that regions of rod- and cone-specific open chromatin are enriched for distinct sets of TF binding sites, providing insight into the cis-regulatory grammar of these cell types. We also sought to understand how the regulatory activity of rod and cone open chromatin regions is encoded in DNA sequence. Cone-rod homeobox (CRX) is a paired-like homeodomain TF and master regulator of both rod and cone development, and CRX binding sites are by far the most enriched TF binding sites in photoreceptor CREs. The in vitro DNA binding preferences of CRX have been extensively characterized, but how well in vitro models of TF binding site affinity predict in vivo regulatory activity is not known. In addition, paired-class homeodomain TFs bind DNA as both monomers and dimers, but whether monomeric and dimeric CRX binding sites have distinct regulatory activities is not known. To address these questions, we used a massively parallel reporter assay to quantify the activity of thousands native and mutant CRX binding sites in explanted mouse retinas. These data reveal that dimeric CRX binding sites encode stronger enhancers than monomeric CRX binding sites. Moreover, the activity of half-sites within dimeric CRX binding sites is cooperative and spacing-dependent. In addition, saturating mutagenesis of 195 CRX binding sites reveals that, while TF binding site affinity and activity are moderately correlated across mutations within individual CREs, they are poorly correlated across mutations from distinct CREs. Accordingly, we show that accounting for baseline CRE activity improves the prediction of the effects of mutations in regulatory DNA from sequence-based models. Taken together, these data demonstrate that the activity of CRX binding sites depends on multiple layers of sequence context, providing insight into photoreceptor gene regulation and illustrating functional principles of homeodomain TF binding sites

    On Identifying Signatures of Positive Selection in Human Populations: A Dissertation

    Get PDF
    As sequencing technology continues to produce better quality genomes at decreasing costs, there has been a recent surge in the variety of data that we are now able to analyze. This is particularly true with regards to our understanding of the human genome—where the last decade has seen data advances in primate epigenomics, ancient hominid genomics, and a proliferation of human polymorphism data from multiple populations. In order to utilize such data however, it has become critical to develop increasingly sophisticated tools spanning both bioinformatics and statistical inference. In population genetics particularly, new statistical approaches for analyzing population data are constantly being developed—unfortunately, often without proper model testing and evaluation of type-I and type-II error. Because the common Wright-Fisher assumptions underlying such models are generally violated in natural populations, this statistical testing is critical. Thus, my dissertation has two distinct but related themes: 1) evaluating methods of statistical inference in population genetics, and 2) utilizing these methods to analyze the evolutionary history of humans and our closest relatives. The resulting collection of work has not only provided important biological insights (including some of the first strong evidence of selection on human-specific epigenetic modifications (Shulha, Crisci, Reshetov, Tushir et al. 2012, PLoS Bio), and a characterization of human-specific genetic changes distinguishing modern humans from Neanderthals (Crisci et al. 2011, GBE)), but also important insights in to the performance of population genetic methodologies which will motivate the future development of improved approaches for statistical inference (Crisci et al, in review)

    Associative Pattern Recognition for Biological Regulation Data

    Get PDF
    In the last decade, bioinformatics data has been accumulated at an unprecedented rate, thanks to the advancement in sequencing technologies. Such rapid development poses both challenges and promising research topics. In this dissertation, we propose a series of associative pattern recognition algorithms in biological regulation studies. In particular, we emphasize efficiently recognizing associative patterns between genes, transcription factors, histone modifications and functional labels using heterogeneous data sources (numeric, sequences, time series data and textual labels). In protein-DNA associative pattern recognition, we introduce an efficient algorithm for affinity test by searching for over-represented DNA sequences using a hash function and modulo addition calculation. This substantially improves the efficiency of \textit{next generation sequencing} data analysis. In gene regulatory network inference, we propose a framework for refining weak networks based on transcription factor binding sites, thus improved the precision of predicted edges by up to 52%. In histone modification code analysis, we propose an approach to genome-wide combinatorial pattern recognition for histone code to function associative pattern recognition, and achieved improvement by up to 38.1%38.1\%. We also propose a novel shape based modification pattern analysis approach, using this to successfully predict sub-classes of genes in flowering-time category. We also propose a combination to combination associative pattern recognition, and achieved better performance compared against multi-label classification and bidirectional associative memory methods. Our proposed approaches recognize associative patterns from different types of data efficiently, and provides a useful toolbox for biological regulation analysis. This dissertation presents a road-map to associative patterns recognition at genome wide level

    Computational Characterization of Genome-wide DNA-binding Pro les

    Get PDF
    The work and data that is presented in this thesis is part of a collaborative project that is funded by the Berlin Center for Regenerative Therapies. A number of people have contributed to this work and for clarity I will now mention the individual contributions. Stefan Mundlos, Peter N. Robinson and Jochen Hecht designed this project with the purpose of studying bone development using ChIP-seq in a chicken model. Jochen Hecht and Asita Stiege established the ChIP-seq protocol and together with Daniel Ibrahim, Hendrikje Hein, and Catrin Janetzky carried out the immunoprecipitations and sequencing. Peter Krawitz was responsible for the data processing that involved base calling and basic quality control. Daniel Ibrahim contributed to the analysis on the Hox proteins identifying the Q317K mutant to be related to Pitx1 and Obox family members. Sebastian Kohler and Sebastian Bauer carried out the computation of the Gene Ontology similarity data and random walk distances that I used for the target gene assignments in chapter 5. The results for the EMSA experiments that are shown in chapter three has been carried out by Asita Stiege. The work on target gene assignment that is presented in chapter 5 has been published in Nucleic Acids Research [1]. All the remaining methods, data and the experimental results will be partially be included in future publications by Ibrahim et al. and Hein et al.

    Understanding the Evolution of Gene Expression from a Regulatory Network Perspective

    Full text link
    The evolution of transcriptional regulation has been demonstrated to be a major contributor to phenotypic evolution. One important step in transcriptional regulation is the interaction between transcription factors and their target genes, the organization of which is represented by the Transcriptional Regulatory Network (TRN). Recent studies have shown that structural properties within a TRN provide important information for understanding how different transcriptional patterns are formed in many biological systems. However, it is less clear whether or not those structural properties are also informative in understanding the evolution of transcriptional patterns. In this thesis, I examined the question of whether the number of connections for a gene in a TRN was associated with observed gene expression differences by combining published datasets from multiple related Drosophila species. Specifically, I found that increasing number of regulators (in-degree) for a gene was associated with decreasing differences in gene expression and cis regulation. Meanwhile, I found no significant relationship between the number of targets (out-degree) for a transcription factor and differences in gene expression. To assess the generality of the conclusions from Drosophila species, I inferred a whole-genome transcriptional regulatory network in Saccharomyces cerevisiae and combined it with published gene expression datasets involving multiple Saccharomyces species to examine the relationship between in-degree/out-degree and differences in gene expression. I found that increasing in-degree was associated with increasing differences in gene expression between two strains of S. cerevisiae, but no significant relationship between in-degree and differences in gene expression was detected in all comparisons between two diverged Saccharomyces species. These two studies suggest that whether and how the number of interactions for a gene within a TRN could impact the evolution of the transcription level might depend on the biological system under consideration. Finally, I examined whether and how existing genetic variants that disrupted transcriptional regulation of a yeast gene TDH3 could influence how random mutations change its expression, by introducing random mutations into 8 yeast strains each carrying a single genetic variant responsible for altering the expression level of TDH3 and quantifying both the mean expression level and expression noise for resulting mutagenized cells in each of the 8 genetic backgrounds. I found that the lab strain BY was less sensitive to random mutations on the mean expression level, compared to other genotypes carrying genetic variants. Also, I found that relationships between effects of random mutations on the mean level of expression and expression noise depend on the existing genetic variants. In addition, I found that the sensitivity to random mutations on mean level of expression was positively correlated with the expression noise for strains carrying genetic variants in the TDH3 promoter. This study demonstrates that various aspects of how random mutations alter the expression of a single gene are modified by existing genetic changes that disrupt the transcriptional regulation.. Taken together, my thesis work demonstrates that the transcriptional regulatory network provides an informative context to study the evolution of gene expression, in the sense that both the process of the accumulation of genetic variations and formation of the ultimate evolutionary patterns are potentially affected by the interactions within the network.PHDMolecular, Cellular, and Developmental BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138702/1/ypauling_1.pd
    • …
    corecore