91 research outputs found
Non-coding genome contributions to the development and evolution of mammalian organs
Protein-coding sequences only cover 1-2% of a typical mammalian genome. The remaining non-coding space hides thousands of genomic elements, some of which act via their DNA sequence while others are transcribed into non-coding RNAs. Many well-characterized non-coding elements are involved in the regulation of other genes, a process essential for the emergence of different cell types and organs during development. Changes in the expression of conserved genes during development are in turn thought to facilitate evolutionary innovation in form and function. Thus, non-coding genomic elements are hypothesized to play important roles in developmental and evolutionary processes. However, challenges related to the identification and characterization of these elements, in particular in non-model organisms, has limited the study of their overall contributions to mammalian organ development and evolution. During my dissertation work, I addressed this gap by studying two major classes of non-coding elements, long non-coding RNAs (lncRNAs) and cis-regulatory elements (CREs).
In the first part of my thesis, I analyzed the expression profiles of lncRNAs during the development of seven major organs in six mammals and a bird. I showed that, unlike protein-coding genes, only a small fraction of lncRNAs is expressed in reproducibly dynamic patterns during organ development. These lncRNAs are enriched for a series of features associated with functional relevance, including increased evolutionary conservation and regulatory complexity, highlighting them as candidates for further molecular characterization. I then associated these lncRNAs with specific genes and functions based on their spatiotemporal expression profiles. My analyses also revealed differences in lncRNA contributions across organs and developmental stages, identifying a developmental transition from broadly expressed and conserved lncRNAs towards an increasing number of lineage- and organ-specific lncRNAs.
Following up on these global analyses, I then focused on a newly-identified lncRNA in the marsupial opossum, Female Specific on chromosome X (FSX). The broad and likely autonomous female-specific expression of FSX suggests a role in marsupial X-chromosome inactivation (XCI). I showed that FSX shares many expression and sequence features with another lncRNA, RSX — a known regulator of XCI in marsupials. Comparisons to other marsupials revealed that both RSX and FSX emerged in the common marsupial ancestor and have since been preserved in marsupial genomes, while their broad and female-specific expression has been retained for at least 76 million years of evolution. Taken together, my analyses highlighted FSX as a novel candidate for regulating marsupial XCI.
In the third part of this work, I shifted my focus to CREs and their cell type-specific activities in the developing mouse cerebellum. After annotating cerebellar cell types and states based on single-cell chromatin accessibility data, I identified putative CREs and characterized their spatiotemporal activity across cell types and developmental stages. Focusing on progenitor cells, I described temporal changes in CRE activity that are shared between early germinal zones, supporting a model of cell fate induction through common developmental cues. By examining chromatin accessibility dynamics during neuronal differentiation, I revealed a gradual divergence in the regulatory programs of major cerebellar neuron types.
In the final part, I explored the evolutionary histories of CREs and their potential contributions to gene expression changes between species. By comparing mouse CREs to vertebrate genomes and chromatin accessibility profiles from the marsupial opossum, I identified a temporal decrease in CRE conservation, which is shared across cerebellar cell types. However, I also found differences in constraint between cell types, with microglia having the fastest evolving CREs in the mouse cerebellum. Finally, I used deep learning models to study the regulatory grammar of cerebellar cell types in human and mouse, showing that the sequence rules determining CRE activity are conserved across mammals. I then used these models to retrace the evolutionary changes leading to divergent CRE activity between species.
Collectively, my PhD work provides insights into the evolutionary dynamics of non-coding genes and regulatory elements, the processes associated with their conservation, and their contributions to the development and evolution of mammalian cell types and organs
Deciphering Regulatory Networks in the Mouse Genome
Regardless of all the major achievements in the field of genomics and in depth studies of the protein-coding genes, our knowledge about non-coding regions and their contribution in diseases remains incomplete. Large scale projects such as the ENCODE have produced a wealth of sequencing data which can be utilised to study epigenetic features associated with gene regulation. These studies have comprehensively identified regulatory elements such as enhancers in the human genome, but numerous questions still remain on their effect on gene function and disease causation.
The aim of this thesis is to identify enhancer regulatory networks in the mouse genome and investigate their effect on mouse models of human diseases. In order to study enhancer regulation, I have taken two approaches. First, I have produced a catalogue of well-defined multiple enhancer types in a diverse range of mouse tissues and cell-types. By systematically comparing different enhancer types, I found that super- and typical-enhancers have different effect on gene expression, but both are preferentially associated with relevant tissue-type phenotypes. Also genes associated with super- and typical-enhancers exhibit no difference in phenotype effect size or pleiotropy. Second, by utilising publicly available regulatory annotations, my enhancer catalogue and omics data, I have investigated regulatory mechanisms associated with metabolic and circadian mouse models. Here I identified novel regulatory networks or enhancers or transcription factor binding sites pertaining to the mutant mice.
In conclusion, my research has shown the usefulness of integrating enhancer annotations with an array of molecular data and has for the first time shown how different enhancer architectures influence gene function in the mouse genome. This study provides a valuable dataset to further characterise the mechanisms of gene regulation by enhancers in the mouse genome
Entropy-based machine learning algorithms applied to genomics and pattern recognition
Transcription factors (TF) are proteins that interact with DNA to regulate the transcription of DNA to RNA and play key roles in both healthy and cancerous cells. Thus, gaining a deeper understanding of the biological factors underlying transcription factor (TF) binding specificity is important for understanding the mechanism of oncogenesis. As large, biological datasets become more readily available, machine learning (ML) algorithms have proven to make up an important and useful set of tools for cancer researchers. However, there remain many areas for potential improvements for these ML models, including a higher degree of model interpretability and overall accuracy. In this thesis, we present decision tree (DT) methods applied to DNA sequence analysis that result in highly interpretable and accurate predictions.
We propose a boosted decision tree (BDT) model using the binary counts of important DNA motifs to predict the binding specificity of TFs belonging to the same protein family of binding similar DNA sequences. We then proceed to introduce a novel application of Convolutional Decision Trees (CDT) and demonstrate that this approach has distinct advantages over the BDT modeil while still accurately predicting the binding specificty of TFs. The CDT models are trained using the Cross Entropy (CE) optimization method, a Monte Carlo optimization method based on concepts from information theory related to statistical mechanics. We then further study the CDT model as a general pattern recognition and transfer learning technique and demonstrate that this approach can learn translationally invariant patterns that lead to high classification accuracy while remaining more interpretable and learning higher quality convolutional filters compared to convolutional neural networks (CNN)
Using genome-wide data to model signalling-responsive gene regulatory mechanisms in blood development
The control of gene expression driving developmental haematopoiesis crucially depends on distal cis-regulatory elements such as enhancers which directly interact with promoters in the nucleus. However, no global experiments have been conducted which identify the cell type and cell stage-specific activity of enhancers in a chromatin context. It is through these elements that lineage specific transcription factors orchestrate cell fate decisions and direct haematopoietic lineage development emerging from the mesoderm. The roles of transcriptional regulators are beginning to be understood, however, it is still unclear how the myriad of extracellular signals modulate their activity. In this work, we report a global method which enables the identification of thousands tissue-specifically active cisregulatory elements able to stimulate a minimal promoter in cells representing five stages of haematopoietic specification derived from embryonic stem cells. Using serum-free differentiation culture, we demonstrate that our method can identify signalling-responsive enhancer elements and we highlight that it can be adapted to any embryonic stem cell differentiation system generating different cell types. We demonstrate that thousands of cell stage-specific sets of cis-elements are responsive to cytokine signals terminating at signalling-responsive transcription factors. Integrating these data with chromatin accessibility and single cell RNA-Seq data provided important new insights into the regulatory dynamics of the gene regulatory network transitions driving haematopoiesis. Our work identified the cytokine signalling-responsive transcription factors mediating responsiveness of enhancers at each developmental stage. We validated enhancers for Sparc, Pxn, Hspg2, Cdh5, Dlk1 and Mrpl15 as being signalling responsive to VEGF. We found that the cytokine VEGF is a crucial factor that regulates the balance between endothelial and haematopoietic development and our scRNA-seq analysis revealed that in the presence of VEGF Sox17 fails to be downregulated and Runx1 fails to be upregulated in the haemogenic endothelium and progenitor cells. For two Runx1 enhancers (the +23kb and +3.7kb) we studied the transcription factors motifs mediating the responsiveness of the enhancers to VEGF by mutation of these sites. Taken together, our work generated an important novel resource for future studies of haematopoietic differentiation and provides insights into how and where in the genome extrinsic signals program the cell type-specific chromatin landscape driving this process
Isolation and characterization of an RNA polymerase III encoded gene of Pinus radiata and its use in pine transformation
Several promoters such as the cauliflower mosaic virus 35S promoter (CaMV 35S) and its enhanced version, pEMU, the maize ubiquitin promoter, the alcohol dehydrogenase promoter (Adh) and rice actin promoter (Act1) are currently used in Pinus radiata (pine) transformation. These heterologous promoters were adopted for pine transformation for want of an endogenous promoter tailored specifically for the needs of pine. These promoters may not perform to the same extent in pine as in their homologous systems due to differences in quality and/or quantity of regulatory factors. Secondly, because of their heterologous origin, these promoters are open to silencing mechanisms that operate in plants against invasive DNA [Matzke & Birchler, 2005]. This could result in inactivation of these promoters at any time during the 30-year growth period of transformed pine which this poses a real threat to a forestry industry based on transgenic pine.
A pine promoter on the other hand, being endogenous, is less prone to silencing. In addition, confidence in its longevity (continued expression) can be easily established even before using it in transformation. The aim of this study was to isolate and validate pine promoters that can be used in pine transformation. As only a few pine sequences were available in the public domain for gene discovery in pine (at the beginning of this study), heterologous sequence information was used to screen the pine genome or its transcriptome for orthologs with desirable expression features.
The investigation proceeded along two lines. In the first approach, a putatively desirable gene was isolated and the expression profile of its promoter was then validated. This led to the characterization of 5Spr20, a pine SS rDNA paralog. 5Spr20 differs from all published SS rDNA sequence& dnd is therefore a novel pine gene. Analyses of its sequence using bioinformatics revealed that it is capable of initiating biologically active transcripts and 5Spr20 is therefore a functional gene. A recombinant 5Spr20 promoter consisting of the coding region and the immediately upstream region downregulated gus reporter activity by 90% by antisense activity in transient expression studies in pine embryogenic cells. In stable expression studies, a 5Spr20 promoter-driven shDNA construct targeting gus completely silenced reporter activity in the model plant Nicotiana benthamiana. The 5Spr20 promoter appears to hold great promise for use in pine functional genomics and in gene downregulation applications.
In the second line of investigation, the expression profiles of pine orthologs of known heterologous genes were validated prior to gene isolation. Two pine genes that were identified as promising candidates are pine tDNAMet-l and an actin paralog pine, ActX. Both genes were strongly expressed in all vegetative tissues of pine. Several PCR-based methods were used to clone the upstream regions (containing putative promoter elements) but all attempts ended in failure, which is attributed to the presence of pseudogenes and regions homologous to walking/sequencing primers among paralogs.
The pine transcriptome was also screened unsuccessfully for ortholgs of desirable heterologous candidate genes like the ribosomal protein genes MsRL5 of Medicago sativa and AtL 18 of Arabidopsis thaliana and genes for the second largest subunit of RNA polymerase II, gene T13794 and actin-2 of A. thaliana. Sequence heterogeneity, cell-specific expression and low transcript abundance are possible reasons for not being able to detect pine orthologs of these candidate genes in expression screens
RNA Exosome & Chromatin: The Yin & Yang of Transcription: A Dissertation
Eukaryotic genomes can produce two types of transcripts: protein-coding and non-coding RNAs (ncRNAs). Cryptic ncRNA transcripts are bona fide RNA Pol II products that originate from bidirectional promoters, yet they are degraded by the RNA exosome. Such pervasive transcription is prevalent across eukaryotes, yet its regulation and function is poorly understood.
We hypothesized that chromatin architecture at cryptic promoters may regulate ncRNA transcription. Nucleosomes that flank promoters are highly enriched in two histone marks: H3-K56Ac and the variant H2A.Z, which make nucleosomes highly dynamic. These histone modifications are present at a majority of promoters and their stereotypic pattern is conserved from yeast to mammals, suggesting their evolutionary importance. Although required for inducing a handful of genes, their contribution to steady-state transcription has remained elusive. In this work, we set out to understand if dynamic nucleosomes regulate cryptic transcription and how this is coordinated with the RNA exosome.
Remarkably, we find that H3-K56Ac promotes RNA polymerase II occupancy at a large number of protein coding and noncoding loci, yet neither histone mark has a significant impact on steady state mRNA levels in budding yeast. Instead, broad effects of H3-K56Ac or H2A.Z on levels of both coding and ncRNAs are only revealed in the absence of the nuclear RNA exosome. We show that H2A.Z functions with H3-K56Ac in chromosome folding, facilitating formation of Chromosomal Interaction Domains (CIDs). Our study suggests that H2A.Z and H3-K56Ac work in concert with the RNA exosome to control mRNA and ncRNA levels, perhaps in part by regulating higher order chromatin structures. Together, these chromatin factors achieve a balance of RNA exosome activity (yin; negative) and Pol II (yang; positive) to maintain transcriptional homeostasis
Ca2+-sensitive Mef2c protein interactions and chromatin function in microglia-like cells
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder for which there are no disease-modifying therapies. Genetic studies have identified over 50 susceptibility loci
for sporadic AD including the locus encoding the transcription factor MEF2C. Most of the genes implicated in AD-risk are exclusively or preferentially expressed in microglia. Furthermore, AD-risk variants are enriched in microglial open chromatin regions that contain DNA binding motifs for MEF2C. Therefore, genetic variants that disrupt MEF2C binding to DNA in microglia may alter cis-gene expression, contributing to AD-risk. Understanding how MEF2C functions in microglia may provide valuable insights into the genetic basis of AD-risk.
To investigate the role of Mef2c in AD, mass spectrometry was used to identify proteins that co-purify with the endogenous protein in BV2 microglia-like cells. Two major Mef2c isoforms exist in BV2 cells that associate with 110 putative interactors including the transcriptional repressors, Hdac4, Hdac5, and Cabin1. Ionomycin treatment, that raises intracellular [Ca2+], caused the partial dissociation of these repressors from Mef2c and resulted in recruitment of the microglial amyloid-β response proteins, Yes1 and Smpdl3b to the Mef2c complex. However, no Mef2c-activating proteins were identified in the remodelled complex. Having demonstrated that ionomycin treatment remodels the Mef2c interactome, the effect of this treatment on chromatin accessibility in BV2 cells was investigated using ATAC-seq. This revealed that while the motifs for three transcription factors, Atf4, NFATC3 and p53, were enriched at Ca2+ -dependent differentially accessible
sites, Mef2c sites were not similarly enriched. However, Mef2c motif-containing differentially accessible regions were associated with genes that control the microglial
inflammatory response. This thesis investigated two mechanisms by which [Ca2+] levels potentially influence gene regulation; altered protein interactions and chromatin
accessibility and further contributes to our understanding of the transcription factor Mef2c, Ca2+ signalling, and chromatin function in BV2 cells. In conclusion, Ca2+ dysregulation in AD may result in remodelling of the Mef2c interactome leading to abnormal Mef2c-mediated inflammatory responses in microglia
- …