908 research outputs found
Advanced sequencing technologies applied to human cytomegalovirus
The betaherpesvirus human cytomegalovirus (HCMV) is a ubiquitous viral pathogen. It is the most common cause of congenital infection in infants and of opportunistic infections in immunocompromised patients worldwide. The large double-stranded DNA genome of HCMV (236 kb) contains several genes that exhibit a high degree of variation among strains within an otherwise highly conserved sequence. These hypervariable genes encode immune escape, tropism or regulatory factors that may affect virulence. Variation arising from these genes and from an evolutionary history of recombination between strains has been hypothesised to be linked to disease severity. To investigate this, the HCMV genome has been scrutinised in detail over the years using a variety of molecular techniques, most looking only at one or a few of these genes at a time. The advent of high-throughput sequencing (HTS) technology 20 years ago then started to enable more in-depth whole-genome analyses. My study extends this field by using both HTS and the more recently developed long-read nanopore technology to determine HCMV genome sequences directly from clinical samples. Firstly, I used an Illumina HTS pipeline to sequence HCMV strains directly from formalin-fixed, paraffin-embedded (FFPE) tissues. FFPE samples are a valuable repository for the study of relatively rare diseases, such as congenital HCMV (cCMV). However, formalin fixation induces DNA fragmentation and cross-linking, making this a challenging sample type for DNA sequencing. I successfully sequenced five whole HCMV genomes from FFPE tissues. Next, I developed a pipeline utilising the single-molecule, long-read sequencer from Oxford Nanopore Technologies (ONT) to sequence HCMV initially from high-titre cellcultured laboratory strains and then from clinical samples with high HCMV loads. Finally, I utilised a direct RNA sequencing protocol with the ONT sequencer to characterise novel HCMV transcripts produced during infection in cell culture, demonstrating the existence of transcript isoforms with multiple splice sites. Overall, my findings demonstrate how advanced sequencing technologies can be used to characterise the genome and transcriptome of a large DNA virus, and will facilitate future studies on HCMV prognostic factors, novel antiviral targets and vaccine development
Specificity of the innate immune responses to different classes of non-tuberculous mycobacteria
Mycobacterium avium is the most common nontuberculous mycobacterium (NTM) species causing infectious disease. Here, we characterized a M. avium infection model in zebrafish larvae, and compared it to M. marinum infection, a model of tuberculosis. M. avium bacteria are efficiently phagocytosed and frequently induce granuloma-like structures in zebrafish larvae. Although macrophages can respond to both mycobacterial infections, their migration speed is faster in infections caused by M. marinum. Tlr2 is conservatively involved in most aspects of the defense against both mycobacterial infections. However, Tlr2 has a function in the migration speed of macrophages and neutrophils to infection sites with M. marinum that is not observed with M. avium. Using RNAseq analysis, we found a distinct transcriptome response in cytokine-cytokine receptor interaction for M. avium and M. marinum infection. In addition, we found differences in gene expression in metabolic pathways, phagosome formation, matrix remodeling, and apoptosis in response to these mycobacterial infections. In conclusion, we characterized a new M. avium infection model in zebrafish that can be further used in studying pathological mechanisms for NTM-caused diseases
Systems view of neuronal mRNA localisation - comparative analysis and determinants of the sub-cellular neuronal transcriptome
Control over the intracellular localisation of RNA is an important aspect of post-transcriptional regulation, especially for highly polarised cells like neurons. The presence of specific transcripts in axons and dendrites, together neurites, is determined by cis-active elements called zipcodes and ultimately allows neurons to locally synthesise required proteins and adapt quickly to cues of the local environment. This capability is important for the correct function of synapse remodelling and memory formation and a disruption of RNA localisation in neurons has been associated with several neurodegenerative diseases.
Using RNA sequencing a vast number of transcripts can be detected in neurites of neuronal model systems. With over 7500 transcripts even the neurite core transcriptome, which I summarised from the published datasets generated in the last decade, contains at least half or a third of the full neuronal transcriptome. Whether all of these transcripts should be considered localised to neurites or if this designation is better determined by differential expression analysis between compartments is still diffcult to answer. For one thing, no strong overlap of transcripts with localisation based on significant enrichment in many individual datasets exists and also my integrated analysis utilising batch correction did only generate a relatively small set of differentially expressed genes, which however tend to have more conserved enrichment. Secondly, several transcripts that are generally considered as classical localised transcripts, like the Actb mRNA, are not relatively enriched in neurites, even if they are strongly expressed there.
Relying on a set of transcripts with consistent neurite enrichment based on datasets from primary murine neurons I designed Nzip, a massively parallel reporter assay (MPRA) aimed at the identification of unknown zipcodes. Based on 16 candidate sequences determined from the first experiment, it was possible to identify 2 new zipcode motifs utilising a secondary library with a mutational analysis approach: the let-7 miRNA seed sequence CUACCUC and an (AU) repeat motif. The compartmental quantification of miRNAs and associated protein machinery indicates a stronger activity of let-7 in soma, providing a potential mechanism for its zipcode activity. Additionally, also the (AU) motif is also associated with lower read counts in soma and several identified binding proteins have known effects on RNA stability, indicating that it likely also affects RNA localisation through stability regulation. Building on my observations that assignment of RNA localisation state based on either detection or enrichment in neurites both is problematic and that Nzip mainly identified motifs conferring neurite enrichment by RNA stability, I argue that a clear distinction between localised and not-localised tran- scripts may not be an accurate description of the biological system. Instead, zipcodes likely affect the probability of a given transcript to reach neurites and there may also be different mechanisms that affect the tendency for localisation as measured by enrichment or detection. Whether this is a more accurate description of RNA localisation mechanics as well as the exact functions of the zipcodes I identified should be further investigated in future studies.
As a second part of my work I have contributed to studying the human neurodegenerative disease amyotrophic lateral sclerosis (ALS). This affliction is mainly characterised by the degradation of motor neurons usually starting at the synapses between axons and skeletal muscle, the neuromuscular junction (NMJ), and in many cases is known to be caused by mutations in several RNA binding proteins affecting RNA localisation. Among these is the FUS protein, whose mutations often disrupt its exclusive nuclear localisation and thus can lead both to a loss-of-function as well as a toxic gain-of-function trough availability of new RNA targets in the cytoplasm. To study a disease like ALS a cellular model system for human neurons is needed, which replicates the relevant molecular signatures of affected motor neurons, specifically including the axonal containing neurite compartment. I have characterised the transcriptome and proteome of induced motor neurons (iMN) generated by expression of NGN2, ISL1 and LHX3 transcription factors. This system showed expected expression of marker genes throughout motor neuron differentiation as well as proper specification of neurite compartment and similarity with signatures of electrophysiological maturity. Using the iMN model system I performed investigative analysis for the effect of ALS patient derived FUS mutations on the proteome and transcriptome, specifically including effects pronounced in the neurite compartment. With this I identified many differentially expressed genes already associated with ALS or FUS mutations, which, however, span a very wide field of functional associations and to my understanding are more likely linked to a disruption of normal FUS activity. However, I also observed a more consistent and rarely reported pattern of down-regulation of genes building the extracellular matrix around the NMJ, which was specifically notable in the neurites of cells with cytoplasmic localised P525L FUS. Additionally, I found a very similar pattern of down-regulation in neurites for genes passing the secretory pathway, known target transcripts of FUS, as well as those with a G-quadruplex motif, which has been identified as a potential binding site for both FUS and other ALS associated RBPs. This highlights a potential toxic gain-of-function for FUS as well as a particular pathway which may be important in the axonal degeneration in ALS. Validation of this observation including any potential significance of an overlap between the affected gene groups I identified should be the focus of further work.Die Kontrolle über die intrazelluläre Lokalisation von RNA is ein wichtiger Aspekt der post-transkriptionalen Regulation, insbesondere für stark polarisierte Zellen wie Neuronen. Die Präsenz von bestimmten RNA Molekülen in Axonen oder Dendriten, zusammen Neuriten, wird durch cis-aktive Elemente, sogenannte ’Zipcodes’, bestimmt und erlaubt ultimativ, dass Neuronen schnell auf Reize reagieren und benötigte Proteine dort lokal synthetisieren können. Diese Fähigkeit ist essentiell, damit die Remodellierung von Synapsen und das Bilden von Erinnerungen korrekt funktioniert, weiterhin wurde die Störung von RNA-Lokalisation mit der Pathophysiologie mehrerer neurodegenerativer Erkrankungen in Zusammenhang gebracht.
Durch RNA Sequenzierung kann eine große Anzahl von Transkripten in Neuriten von Modellsystemen für Neuronen identifiziert werden. Mit über 7500 verschiedenen Transkripten enthält selbst das Kerntranskriptom, das ich aus den veröffentlichten Datensätzen der letzten Jahre zusammengestellt habe, nahezu ein Drittel oder die Hälfte des kompletten neuronalen Transkriptoms. Ob all diese Transkripte tatsächlich als lokalisiert angesehen werden sollten oder ob diese Bezeichnung besser durch differenzielle Expression zwischen Zellfraktionen bestimmt werden sollte, ist schwer zu beurteilen. Einerseits gibt es keine klare Übereinstimmung von Transkripten mit differenzieller Lokalisation zwischen einzelnen Datensätzen und auch mit meiner integrativen Datenanalyse inklusive Batch-Korrektur konnte ich nur ein relativ kleines Set an differenziell exprimierten Transkripten identifizieren. Zum anderen gibt es einige Transkripte, die im allgemeinen als lokalisiert angesehen werden, wie beispielsweise die Actb mRNA, die allerdings trotzdem keine relative erhöhte Expression in Neuriten aufzeigen, auch wenn sie dort stark exprimiert sind.
Mit einem Set von in primären murinen Neuronen konsistent Neurit-angereicherten Transkripten habe ich ’Nzip’ konstruiert, ein massiv paralles Reporter Experiment mit dem Ziel unbekannte Zipcodes zu identifizieren. Mit 16 potentiellen Sequenzen, die in einem ersten Experiment entdeckt wurden und einem Mutations-Analyse Ansatz war es möglich 2 neue Zipcodes zu bestimmen: die let-7 miRNA Zielsequenz CUACCUC und ein (AU) Motiv. Die Zellfraktion-spezifische Quantifizierung von miRNAs und ihrer assoziierten Proteinmaschinerie legt zusätzlich nahe, dass die Aktivität von let-7 im Zellkörper stärker ist, was einen potentiellen Mechanismus für diese Zipcode Aktivität liefert. Weiterhin wurde auch für das (AU) Motiv eine ähnliche Reduzierung von RNA Molekülen in Zellkörper festgestellt und einige der dazu identifizierten Bindeproteine können die Stabilität von RNA beeinflussen, sodass dieses Motif RNA Lokalisation vermutlich ebenfalls über Stabilitätsregulation kontrolliert.
Aufbauend auf meinen Beobachtungen, dass die Designation des RNA Lokalisationsstatus basierend auf entweder Detektion oder Anreicherung in Neuriten problematisch ist, und dass Nzip primär Motive identifiziert hat, die RNA Anreicherung durch Stabilität kontrollieren, bin ich der Auffassung, dass eine klare Unterscheidung zwischen lokalisierten und nicht lokalisierten Transkripten nicht unbedingt einer akkuraten Beschreibung entspricht. Stattdessen halte ich es für zutreffender, dass Zipcodes die Wahrscheinlichkeit beeinflussen, dass eines gegebenes Transkript den Zellkörper verlässt, und dass potentiell unterschiedliche Mechanismen die Präsenz und Anreicherung von Transkripten in Neuriten kontrollieren. Ob dieses Modell von RNA Lokalisation korrekt ist sowie die exakte Funktionsweise der entdeckten Zipcodes, muss allerdings noch durch weitere Untersuchungen kontrolliert werden.
Der zweiten Teil meiner Arbeit widmet sich der Erforschung der neurodegenerativen Erkrankung Amyotrophe Lateralsklerose (ALS). Diese zeichnet sich vor allem dadurch aus, dass Motorneuronen beginnend an den Synapsen zwischen Axonen und Muskeln, der neuromuskulären Endplatte (NME), degenerieren. Weiterhin gibt es RNA bindende Proteine, deren Mutationen die Krankheit auslösen können. Unter diesen befindet sich das FUS Protein, dessen Mutation oft seine Lokalisation im Zellkern beeinträchtigt und damit sowohl zu funktionalem Verlust als auch zu neuen toxischen Funktionen durch das Binden anderer RNA Moleküle im Zytoplasma führen kann.
Um eine Krankheit wie ALS zu erforschen ist ein zelluläres Modellsystem für humane Neuronen erforderlich, das relevante molekulare Signaturen der betroffenen Motorneuronen und insbesondere der Axon- und Neurit-Fraktion abbildet. Ich habe das Transkriptom und Proteom von induzierten Motor- neuronen (iMN) charakterisiert, die durch die Expression von NGN2, ISL1 und LHX3 generiert werden. Dieses System zeigt die während der Motorneuron-Differenzierung erwarteten Expressionsmuster sowie klar spezifizierte Neurite und Ähnlichkeit zu elektrisch aktiven Signaturen.
Mit dem iMN System habe ich eine erste Untersuchung der Effekte von FUS Mutationen aus ALS Patienten auf das Proteom und Transkriptom und insbesondere die Neurit Fraktion durchgeführt. Dabei habe ich einige differentiell exprimierte Gene gefunden, die bereits mit ALS oder FUS Mutationen assoziert wurden, allerdings auch ein sehr breites funtionales Spektrum umfassen und daher vermutlich mit der Störung von normaler FUS Funktion zusammenhängen. Zusätzlich habe ich aber auch eine konsistente und bisher weniger beachtete Expressionsreduktion von Genen der extrazellulären Matrix nahe der NME beobachtet, die insbesondere in Neuriten von Zellen mit zytoplasmatischem P525L FUS präsent ist. Weiterhin habe ich ähnliche Muster mit reduzierter Expression von den Genen gefunden, die den Sekretionsweg passieren, bekannte Bindeziele von FUS sind, oder ein G-Quadruplex Motiv besitzen, wobei letzteres als potentielle Bindestelle für FUS und andere ALS assoziierte Proteine identifiziert wurde. Auch wenn diese Beobachtungen einen potentiellen toxischen Funktionsgewinn für FUS darstellen und einen bestimmten molekularen Pfad hervorheben, der wichtig für den Verlauf von ALS sein könnte, müssen sie noch durch weitere Studien verifiziert werden, da insbesondere die Signifikanz einer Überschneidung der von mir identifizierten betroffenen Gen-Gruppen bisher nicht klar ist
The Ins and outs of the chromatoid body
Spermatogenesis is a complex differentiation process that produces millions of genetically unique sperm cells every day. During spermatogenesis the developing germ cells undergo metamorphic changes as they transform from primitive spermatogonial stem cells to large meiotic spermatocytes, divide into smaller round spermatids and finally become streamlined, compact sperm. Each cell type has a unique transcriptional profile. Early cells of spermatogenesis, especially meiotic spermatocytes express massive amounts of transcripts while transcription is completely halted later due to the nuclear condensation of spermatids. To cope with these transcriptomic challenges, large cytoplasmic ribonucleoprotein granules called germ granules appear and provide dynamic platforms for the transcripts and their regulators to come together.
Here, two proteins of the largest germ granule, the Chromatoid Body (CB), were selected for investigation: the autophagosome transporting FYCO1 and the RNA degrading endonuclease SMG6. Two mouse lines were created to reveal the roles of these germ granule components in spermatogenesis. The results show that FYCO1 is needed for the integrity of germ granules. CB morphology was disrupted in the absence of FYCO1, a phenotype that worsened under stress conditions. Nonetheless FYCO1 depleted mice were fertile. Conversely, the deletion of the second component, endonuclease SMG6, lead to infertility. The results showed that SMG6 is required for the transcriptional balance of developing germ cells which it regulates together with the piRNA pathway. Both studies highlight the importance of germ granules in spermatogenesis.
Overall, this thesis comprises three studies. First, a simple BSA-gradient method to isolate round spermatids and spermatocytes from mice using standard laboratory equipment was developed to facilitate the two main studies of this thesis work. In the first of these studies FYCO1 was identified as a link between autophagy and the CB while the second revealed the role of the endonuclease SMG6 in spermatogenesis and male germ cells transcriptional integrity. Together these two studies contribute to revealing the functions of the enigmatic germ granules and the pivotal roles they play for the maintenance of male fertility.Kromatoidikappale: Sukellus pintaa syvemmälle
Tahaton lapsettomuus on maailmanlaajuisessa kasvussa ja erityisesti miesten hedelmällisyyden häiriöistä tiedetään edelleen vähän. Spermatogeneesi on ainutlaatuinen, monimutkainen ja tarkoin säädelty kehitysprosessi. Ensin kantasolut jakautuvat mitoottisesti kasvattaakseen määräänsä. Meioosissa ne sekoittavat geneettisen materiaalinsa luoden uusia yhdistelmiä, mahdollistaen evoluution. Lopulta ne käyvät läpi morfologisen muodonmuutoksen pyöreästä solusta siittiölle tyypilliseen virtaviivaiseen ulkomuotoonsa. Sukusolut ilmentävät genomiaan aktiivisesti ja niiden RNA profiili onkin poikkeuksellisen monimuotoinen verrattuna muihin erilaistuneisiin soluihin. RNA‐säätely on erittäin tärkeässä asemassa, ja onnistuneeseen kehitykseen tarvitaan laaja kirjo erilaisia mekanismeja. Tärkeässä osassa ovat niin sanotut sukusolujyvät, joihin RNA molekyylit ja RNA:ta sitovat proteiinit kerääntyvät.
Olen väitöskirjatyössäni keskittynyt hedelmällisyyden kannalta tärkeän sukusolujyvän, kromatoidikappaleen (Chromatoid body, CB), toimintaan ja luonut kaksi poistogeenistä hiirimallia niin, että yksi CB komponentti, joko FYCO1 tai SMG6, on poistettu uroshiiriltä. Yllättäen puutteita siittiönmuodostuksesta ei löytynyt FYCO1-poistogeenisiltä uroksilta, vaikka CB:n ulkomuoto ja toiminta olikin häiriintynyt. Yhteys CB:n ja autofagian, joka on solujen oma kierrätys ja laadunvalvonta mekanismi, välillä kuitenkin paljastui. Tutkimus osoitti myös, että FYCO1 proteiinilla on tärkeä rooli sukusolujyvästen rakenteen ylläpidossa. Toisessa työssä keskityttiin tutkimaan RNA:ta hajottavan SMG6 proteiinin roolia sukusoluissa. Smg6-poistogeenisten uroshiirien siittiönmuodostus oli vakavasti häiriintynyt ja urokset olivat hedelmättömiä. Erityisesti kehittyvien siittiöiden normaalin RNA profiilin ylläpidossa oli ongelmia Smg6-poistogeenisilla uroshiirillä, mikä kertoo SMG6 proteiinin mahdollisista tehtävistä sukusoluissa. Näiden töiden lisäksi väitöskirjaani sisältyy kolmas työ, jossa kehitettiin uusi menetelmä, jolla eri sukusolujen eristäminen kiveksestä on mahdollista myös hyvin pienellä alkumateriaalilla, jolla toivomme voivamme vaikuttaa koe-eläinten käyttömäärän vähentämiseen. Tämä menetelmä myös mahdollistaa solueritellyn sukusolututkimuksen laajemmalle tutkijayhteisölle. Kokonaisuutena väitöskirjani valaisee sukusolujyvästen tärkeää tehtävää siittiönmuodostuksessa, sekä paljastaa molekyylitason mekanismeja tässä elintärkeässä prosessissa
Genetic Conditions Affecting the Skeleton
In this Special Issue of Genes entitled “Genetic Conditions Affecting the Skeleton: Congenital, Idiopathic Scoliosis and Arthrogryposis”, evidence is presented that suggests that congenital, idiopathic scoliosis, and arthrogryposis share similar overlapping, but also distinct, etiopathogenic mechanisms, including connective tissue and neuromuscular mechanisms. Congenital scoliosis (CS) is defined by the presence of an abnormal spinal curvature, due to an underlying vertebral bony malformation (VM). Idiopathic scoliosis (IS) is defined by the presence of an abnormal structural spinal curvature of ≥10 degrees in the sagittal plane, in the absence of an underlying VM. Arthrogryposis is defined by the presence of congenital contractures in two or more joints of the appendicular skeleton. All three conditions have complex genetic causes. This Special Issue highlights the complex nature of these conditions and current concepts in our approach to better understand their genetics
The causes of retinal dystrophy and the development of more comprehensive screening approach
Inherited retinal diseases (IRDs) are a group of genetically and phenotypically heterogenous disorders caused by variants in around 280 genes. Additional loci have also been localised to chromosomal regions, though the causative genes remain unknown. Recent improvements in screening technologies have increased the detection of pathogenic variants in IRD. This thesis describes the use of next generation sequencing (second (short-read) and third (long-read) generation sequencing) to find missing or hard to find pathogenic variants in IRD patients.
The first results chapter describes use of whole exome sequencing to screen 24 individuals with syndromic and non-syndromic IRDs. This identified pathogenic variants in known genes in eight cases; CDHR1 (c.1527T>G, p.Y509*), RHO (c.284T>C, p.L95P), PRPF31 (c.797delC, p.S266*), CNGA3 (c.1088T>C, p.L363P), BBS10 (c.728-731delAAGA, p. K243Ifs*15), USH2A (c.252T>G, p.C84W), ABCA4 (c.2588G>C, p.G863A and c.6089G>A, p.R2030Q), and SLC25A46 (c.670A>G, p.T224A). In addition, several candidate variants were highlighted for further investigation.
In the second results chapter, seven patients with late onset macular dystrophy and one with age related macular degeneration were found to carry the same heterozygous ~126 kb deletion encompassing CRX, TPRX1 and SULT2A1. This phenotype has already been documented in patients with heterozygous variants in the gene encoding retinal transcription factor CRX, while there is no known functional or phenotypic link with variants in TPRX1 or SULT2A1. This therefore confirms that CRX haploinsufficiency is pathogenic, a finding that had previously been debated in the ophthalmic literature. The deletion was characterized using a PCR assay followed by cloning and Sanger sequencing or direct Sanger sequencing. Haplotype analysis was done by microsatellite genotyping.
The third results chapter describes use of SMRT PacBio and nanopore long-read sequencing to screen the hard-to-sequence mutation hotspot RPGR-ORF15. Both approaches were effective in reading throughout ORF15 and allowed sequencing indexed pooled samples, and 218 IRD patients were screened, detecting known and new variants. Nanopore sequencing on the smaller Flongle flowcell allowed low-cost optimisation, but pores rapidly blocked, probably due to ORF15 secondary structures. Repeated DNase I washes reopened the pores but required use of the more expensive MinION flowcells. Ultimately, the PacBio sequencer proved simpler to use, cheaper, and more scalable
Transcriptional dynamics of the Sonic hedgehog gene
Enhancers are capable of driving gene expression over linearly vast
distances, allowing precise patterns of spatiotemporal gene expression. They
are able to do this independent of orientation to the promoter, and a single
gene often has multiple enhancers. There is still limited understanding of how
developmental enhancers drive transcription. It must be a highly regulated
process, previous evidence has shown that alterations in expression levels
can result in developmental malformations. Furthermore, there is debate
surrounding the mechanisms of how enhancers interact with their distal
promoters. The models currently most popular in the field are looping and
transcriptional hubs.
Sonic hedgehog (Shh) gene expression is a good model to further our
understanding of both developmental transcriptional regulation and distal
enhancer-promoter interactions. Shh expression is regulated by many tissue
and spatial specific enhancers with, in some instances, single enhancers
driving expression in single embryonic domains. The enhancers are all
located within a single TAD, at a range of distances from the Shh promoter.
Over the years, my lab has taken a special interest in the limb enhancer,
ZRS. The ZRS drives transcription in the distal posterior mesenchyme of the
developing limb bud in a domain called the ZPA. We have been able to
identify a network of activator and repressor binding sites within the ZRS that
restricts transcription in the absence of histological boundaries providing an
interesting model for me to explore mechanisms for how a developmental
enhancer drives transcription.
Throughout this thesis I will address three main aims. Firstly, I will establish
transcriptional characteristics at the wild-type Shh locus. Before I start
exploring how an enhancer drives transcription using different mouse
models, I first need a strong understanding of what transcription looks like in
wild-type animals. I addressed this using nascent RNA-FISH. Using nascent
RNA-FISH I have been able to determine the bursting frequency of Shh in
the ZPA. Furthermore, this technique has allowed me to ascertain if an active
enhancer can be transcribed through. Meanwhile, the use of RNAscope has
allowed me to establish the overall pattern of expression across the ZPA.
Determining whether there is a clear-cut boundary or a gradient type pattern.
Secondly, I will decipher the role of discrete functional elements of the ZRS in
transcription. Previous work from members of my lab has identified different
binding sites located throughout the ZRS. For example, there are four known
Hox binding sites. Mutations in these sites are known to cause down-regulation of Shh. I used mutants for these sites to determine the action of
HOXD proteins on the ZRS and how this impacts transcription. Furthermore,
I have investigated how pioneer factor binding of the ZRS influences
transcriptional characteristics. This was done by investigating how the
mutation of a Lim homeodomain binding site impacted transcriptional
characteristics of Shh. To contrast, I then explored how up-regulatory
mutations of the ZRS effect transcription by using mice where a ZRS
repressor site, the WMS, was disrupted. This work revealed that HOXD
proteins, Limb homeodomain proteins and proteins binding the WMS all have
different influences on Shh transcription, revealing a range of different roles.
Finally, I will explore long-range regulation of distal enhancer-promoter
interactions. There have been multiple models proposed to explain how
enhancers interact with their target promoters. Two of these models for
explaining long range regulation are the looping model and the transcriptional
hub model. These two models can be differentiated by the action of a single
enhancer on multiple promoters, the looping model predicts promoter choice
while the transcriptional hub predicts simultaneous promoter activation. To
test these models, I looked at the ability of Shh enhancers to drive
transcription of multiple promoters in different contexts. Firstly, I used a
mouse line carrying a LacZ reporter integrated within the Shh TAD. This
provided a second internal promoter where expression is driven by Shh
enhancers in their cognate tissues. Secondly, I performed experiments
examining activation of two endogenous genes in adjacent TADs, Shh and
Mnx1
Differential evolution of non-coding DNA across eukaryotes and its close relationship with complex multicellularity on Earth
Here, I elaborate on the hypothesis that complex multicellularity (CM, sensu Knoll) is a major evolutionary transition (sensu Szathmary), which has convergently evolved a few times in Eukarya only: within red and brown algae, plants, animals, and fungi. Paradoxically, CM seems to correlate with the expansion of non-coding DNA (ncDNA) in the genome rather than with genome size or the total number of genes. Thus, I investigated the correlation between genome and organismal complexities across 461 eukaryotes under a phylogenetically controlled framework. To that end, I introduce the first formal definitions and criteria to distinguish ‘unicellularity’, ‘simple’ (SM) and ‘complex’ multicellularity. Rather than using the limited available estimations of unique cell types, the 461 species were classified according to our criteria by reviewing their life cycle and body plan development from literature. Then, I investigated the evolutionary association between genome size and 35 genome-wide features (introns and exons from protein-coding genes, repeats and intergenic regions) describing the coding and ncDNA complexities of the 461 genomes. To that end, I developed ‘GenomeContent’, a program that systematically retrieves massive multidimensional datasets from gene annotations and calculates over 100 genome-wide statistics. R-scripts coupled to parallel computing were created to calculate >260,000 phylogenetic controlled pairwise correlations. As previously reported, both repetitive and non-repetitive DNA are found to be scaling strongly and positively with genome size across most eukaryotic lineages. Contrasting previous studies, I demonstrate that changes in the length and repeat composition of introns are only weakly or moderately associated with changes in genome size at the global phylogenetic scale, while changes in intron abundance (within and across genes) are either not or only very weakly associated with changes in genome size. Our evolutionary correlations are robust to: different phylogenetic regression methods, uncertainties in the tree of eukaryotes, variations in genome size estimates, and randomly reduced datasets. Then, I investigated the correlation between the 35 genome-wide features and the cellular complexity of the 461 eukaryotes with phylogenetic Principal Component Analyses. Our results endorse a genetic distinction between SM and CM in Archaeplastida and Metazoa, but not so clearly in Fungi. Remarkably, complex multicellular organisms and their closest ancestral relatives are characterized by high intron-richness, regardless of genome size. Finally, I argue why and how a vast expansion of non-coding RNA (ncRNA) regulators rather than of novel protein regulators can promote the emergence of CM in Eukarya. As a proof of concept, I co-developed a novel ‘ceRNA-motif pipeline’ for the prediction of “competing endogenous” ncRNAs (ceRNAs) that regulate microRNAs in plants. We identified three candidate ceRNAs motifs: MIM166, MIM171 and MIM159/319, which were found to be conserved across land plants and be potentially involved in diverse developmental processes and stress responses. Collectively, the findings of this dissertation support our hypothesis that CM on Earth is a major evolutionary transition promoted by the expansion of two major ncDNA classes, introns and regulatory ncRNAs, which might have boosted the irreversible commitment of cell types in certain lineages by canalizing the timing and kinetics of the eukaryotic transcriptome.:Cover page
Abstract
Acknowledgements
Index
1. The structure of this thesis
1.1. Structure of this PhD dissertation
1.2. Publications of this PhD dissertation
1.3. Computational infrastructure and resources
1.4. Disclosure of financial support and information use
1.5. Acknowledgements
1.6. Author contributions and use of impersonal and personal pronouns
2. Biological background
2.1. The complexity of the eukaryotic genome
2.2. The problem of counting and defining “genes” in eukaryotes
2.3. The “function” concept for genes and “dark matter”
2.4. Increases of organismal complexity on Earth through multicellularity
2.5. Multicellularity is a “fitness transition” in individuality
2.6. The complexity of cell differentiation in multicellularity
3. Technical background
3.1. The Phylogenetic Comparative Method (PCM)
3.2. RNA secondary structure prediction
3.3. Some standards for genome and gene annotation
4. What is in a eukaryotic genome? GenomeContent provides a good answer
4.1. Background
4.2. Motivation: an interoperable tool for data retrieval of gene annotations
4.3. Methods
4.4. Results
4.5. Discussion
5. The evolutionary correlation between genome size and ncDNA
5.1. Background
5.2. Motivation: estimating the relationship between genome size and ncDNA
5.3. Methods
5.4. Results
5.5. Discussion
6. The relationship between non-coding DNA and Complex Multicellularity
6.1. Background
6.2. Motivation: How to define and measure complex multicellularity across eukaryotes?
6.3. Methods
6.4. Results
6.5. Discussion
7. The ceRNA motif pipeline: regulation of microRNAs by target mimics
7.1. Background
7.2. A revisited protocol for the computational analysis of Target Mimics
7.3. Motivation: a novel pipeline for ceRNA motif discovery
7.4. Methods
7.5. Results
7.6. Discussion
8. Conclusions and outlook
8.1. Contributions and lessons for the bioinformatics of large-scale comparative analyses
8.2. Intron features are evolutionarily decoupled among themselves and from genome size throughout Eukarya
8.3. “Complex multicellularity” is a major evolutionary transition
8.4. Role of RNA throughout the evolution of life and complex multicellularity on Earth
9. Supplementary Data
Bibliography
Curriculum Scientiae
Selbständigkeitserklärung (declaration of authorship
- …