97 research outputs found

    Single-cell transcriptome analyses on developmental transitions in mouse pluripotent stem cells

    Get PDF
    Böttcher M. Single-cell transcriptome analyses on developmental transitions in mouse pluripotent stem cells. Bielefeld: Universität Bielefeld; 2020.We used an in vitro model of Pluripotent Stem Cell (PSC) development in mice to analyze dynamic changes in transcriptomes of hundreds of individual cells which were undergoing an induced transition from naïve mouse Embryonic Stem Cells (mESC) towards primed pluripotent Epiblast Stem Cells (EpiSC). The differentiation of mESCs to EpiSC-like cells takes about five days after induction. We collected cell samples in 24-hour intervals for four days after induction as well as untreated mESCs and primed state EpiSCs. Single-cell isolation and scRNA-seq library preparation for each time point were done on the commercial Fluidigm C1 platform. In addition, we sampled C1-Cap Analysis of Gene Expression (C1-CAGE) libraries for the same set of time points to enable detection of non- coding RNAs (ncRNAs) such as anti-sense RNAs or enhancer RNAs. This C1-CAGE protocol was new and still undergoing optimization at the beginning of our experiments. C1-CAGE was first published by Kouno *et al.* (2019) and the author of this thesis contributed as a co-author. Throughout the work on this project a data management platform called SCPortalen was developed to share all data among project collaborators. SCPortalen’s publication was also co-authored by the author of this thesis (Abugessaisa et al., 2018). The combination of transcriptome datasets from two different protocols allowed the elucidation of expression dynamics of the naïve- to-primed stem cell conversion. We independently identified two subpopulations of cells during the transition process with both the Fluidigm scRNA-seq and C1-CAGE dataset. Pseudotime analysis revealed the developmental trajectory of cells and is a powerful tool to reliably identify developmental stages of cells without prior knowledge of their actual stage. Among these two transition phase subpopu- lations, one showed wide-spread repression of gene expression. The small nuclear RNA (snRNA) *Rn7sk* was identified as one potential regulator of this population specific phenomenon. The second subpop- ulation shared some characteristics with primed EpiSCs such as cell morphology and the expression of known primed state marker genes, but it could be shown that cells from this population were still undergoing Epithelial-Mesenchymal Transition (EMT). That is a clear sign that these cells have not yet fully transitioned to primed pluripotent stem cells. Interestingly, the characteristics of this subpopulation largely match a predicted third pluripotency state called “formative” (Smith, 2017). Therefore, we believe that our dataset not only contains naïve and primed pluripotent stem cells, but also formative pluripotent stem cells. Thus, our dataset represents a unique resource to compare and study this proposed formative pluripotency state. Last but not least, we found several marker gene candidates for all developmental stages of the naïve-to-primed transition, which will facilitate classification of cells in future experiments. For example, we propose *Cd59a* as a highly specific marker gene for primed EpiSCs. The results of this thesis project have also been compiled into a manuscript for publication in a peer reviewed journal and will be submitted soon after the submission of this thesis

    Investigating the role of Schizophrenia-associated gene expression in the developing human brain using Machine Learning

    Get PDF
    Schizophrenia is a debilitating condition that affects 1% of the population, causes significant hardship and though there are treatments available they are characterised by several limitations. It is a complex mental disorder where some individuals show mild subclinical cognitive symptoms before psychosis onset in adolescence. The treatments available only target a portion of the symptoms and although extensive research has been conducted, a comprehensive understanding of the nature of schizophrenia remains elusive. Unlike other neurodevelopmental disorders, schizophrenia symptoms do not typically present themselves until adolescence. This study aimed to discover gene co-expression networks at multiple developmental stages to identify candidate therapeutic targets to better treat and manage schizophrenia. Recent genome-wide association studies have identified 145 genetic loci associated with schizophrenia. Allen Brain Atlas’s BrainSpan resource provides brain development data from neurotypical brains. Using this resource it was possible to study the gene expression of 316 schizophrenia-associated genes, identified previously in a large-scale GWAS, across each of the developmental stages available in the Allen Brain Atlas. K means Clustering and a systems biology approach (WGCNA) was applied to these schizophrenia-associated genes at each developmental stage where modules within networks were created by grouping coexpressed genes. To facilitate biological interpretation of these modules co-expressed genes were visualised using Cytoscape and gene ontology pathway enrichment analysis was applied. We identified 21 hub genes using WGCNA. Of the 316 schizophrenia-associated genes, 27 modules were identified and 3 hub genes GPR52, INA, SATB2 were common in multiple developmental stages. Our results suggest that GPR52, INA, SATB2 represent candidate genes for future evaluation of their potential as therapeutic targets of schizophrenia. Additional hub genes included TRANK1 and ALMS1, genes which were previously identified as expression quantitative trait loci. Taken together our results add further evidence that these genes could be good candidates for further research as they may regulate several schizophrenia-related genes in their respective modules. Finally, our enrichment analysis implicated a role for positive regulation of macrophage proliferation and cellular response to catecholamine stimulus, and cellular response to diacyl bacterial lipopeptide at each developmental stage. The immune system and catecholamines, including dopamine, have long been associated with schizophrenia and our results provide further support for these hypotheses

    Investigating the role of schizophrenia-associated gene expression in the developing human brain using Machine Learning

    Get PDF
    Schizophrenia is a debilitating condition that affects 1% of the population, causes significant hardship and though there are treatments available they are characterised by several limitations. It is a complex mental disorder where some individuals show mild subclinical cognitive symptoms before psychosis onset in adolescence. The treatments available only target a portion of the symptoms and although extensive research has been conducted, a comprehensive understanding of the nature of schizophrenia remains elusive. Unlike other neurodevelopmental disorders, schizophrenia symptoms do not typically present themselves until adolescence. This study aimed to discover gene co-expression networks at multiple developmental stages to identify candidate therapeutic targets to better treat and manage schizophrenia. Recent genome-wide association studies have identified 145 genetic loci associated with schizophrenia. Allen Brain Atlas’s BrainSpan resource provides brain development data from neurotypical brains. Using this resource, it was possible to study the gene expression of 316 schizophrenia-associated genes, identified previously in a large-scale GWAS, across each of the developmental stages available in the Allen Brain Atlas. K means Clustering and a systems biology approach (WGCNA) was applied to these schizophrenia-associated genes at each developmental stage where modules within networks were created by grouping co-expressed genes. To facilitate biological interpretation of these modules co-expressed genes were visualised using Cytoscape and gene ontology pathway enrichment analysis was applied. We identified 21 hub genes using WGCNA. Of the 316 schizophrenia-associated genes, 27 modules were identified and 3 hub genes GPR52, INA, SATB2 were common in multiple developmental stages. Our results suggest that GPR52, INA, SATB2 represent candidate genes for future evaluation of their potential as therapeutic targets of schizophrenia. Additional hub genes included TRANK1 and ALMS1, genes which were previously identified as expression quantitative trait loci. Taken together our results add further evidence that these genes could be good candidates for further research as they may regulate several schizophrenia-related genes in their respective modules. Finally, our enrichment analysis implicated a role for positive regulation of macrophage proliferation and cellular response to catecholamine stimulus, and cellular response to diacyl bacterial lipopeptide at each developmental stage. The immune system and catecholamines, including dopamine, have long been associated with schizophrenia and our results provide further support for these hypotheses

    Genome-wide Determination Of Splicing Efficiency And Dynamics From RNA-Seq Data

    Get PDF
    Eukaryotic genes are mostly composed of a series of exons intercalated by sequences with no coding potential called introns. These sequences are generally removed from primary transcripts to form mature RNA molecules in a post-transcriptional process called splicing. An efficient splicing of primary transcripts is an essential step in gene expression and its misregulation is related to numerous human diseases. Thus, to better understand the dynamics of this process and the perturbations that might be caused by aberrant transcript processing, it is important to quantify splicing efficiency. In this thesis, I introduce SPLICE-q, a fast and user-friendly Python tool for genome-wide SPLICing Efficiency quantification. It supports studies focusing on the implications of splicing efficiency in transcript processing dynamics. SPLICE-q uses aligned reads from RNA-Seq to quantify splicing efficiency for each intron individually and allows the user to select different levels of restrictiveness concerning the introns’ overlap with other genomic elements, such as exons from other genes. I demonstrate SPLICE-q’s application using three use cases including two different species and methodologies. These analyses illustrate that SPLICE-q can detect a progressive increase of splicing efficiency throughout a time course of nascent RNA-Seq and it might be useful when it comes to understanding cancer progression beyond mere gene expression levels. Furthermore, I provide an in-depth study of time course nascent BrU-Seq data to address questions concerning differences in the speed of splicing and the underlying biological features that might be associated with it. SPLICE-q and its documentation are publicly available at: https://github.com/vrmelo/SPLICE-q.Eukaryotische Gene bestehen im Wesentlichen aus einer Reihe von Exons, die durch nicht-kodierende Sequenzen (so genannte Introns) getrennt sind. In einem posttranskriptionellen Prozess, der als Splicing bzw. Spleißen bezeichnet wird, werden diese Sequenzen üblicherweise aus den primären Transkripten entfernt, sodass reife RNA Moleküle entstehen. Effizientes Splicing der primären Transkripte ist ein derart essenzieller Schritt in der Expression von Genen, dass dessen Deregulation Ursache zahlreicher Erkrankungen des menschlichen Körpers ist. Deswegen ist es wichtig die Effizienz des Spleißens robust quantifizieren zu können, um die Dynamik dieses Prozesses und die Auswirkungen der aberranten Prozessierung von Transkripten besser zu verstehen. In diesem Manuskript präsentiere ich SPLICE-q, ein effizientes und benutzerfreundliches Pythonprogramm zur genomweiten Quantifizierung von Spleißeffizienzen (SPLICing Efficiency quantification). Es unterstützt u.a. Studien, die den Effekt von Spleißeffizienz auf die generelle Dynamik der Transkriptprozessierung untersuchen. SPLICE-q benutzt alignierte Reads aus RNA-Seq Experimenten, um die Spleißeffizienz für jedes einzelne Intron zu quantifizieren und erlaubt es dem Benutzer Introns in mehreren unterschiedlich restriktiven Stufen nach deren Überlapp mit anderen genomischen Elementen (bspw. Exons aus anderen Genen) zu filtern. Die Verwendung und Robustheit von SPLICE-q wird anhand von drei verschiedenen Anwendungsbeispielen, inkl. zweier unterschiedlicher Spezies und Methodologien, gezeigt. Diese Analysen demonstrieren, dass SPLICE-q in der Lage ist sowohl, anhand von Daten eines nascent RNA Experiments, einen progressiven Anstieg der Spleißeffizienz über die Zeit festzustellen, als auch zum Verständnis der Entwicklung von Krebszellen, über die bloße Genexpression hinaus, beizutragen. Darüber hinaus, untersucht diese Arbeit eine Zeitreihe aus nascent BrU-Seq-Daten im Detail, um Fragestellungen bzgl. Differenzen in der Spleißgeschwindigkeit in Verbindung mit gewissen biologischen Merkmalen zu klären. Der Quellcode von SPLICE-q und dessen Dokumentation sind öffentlich zugänglich unter: https://github.com/vrmelo/SPLICE-q

    Classification of tissues and disease subtypes using whole-genome signatures

    Get PDF
    Development and application of microarray technology in biological research has led to compilation of expression and sequence data on a genome-wide scale. Given the volume of data produced and the complexity of gene regulatory mechanisms, it can be difficult to extract meaningful biological information. Classification can be used to reduce the complexity through the detection of genes, genetic loci or conditions that share common attributes and the identification of gene expression patterns or genotypes associated with phenotype. In the study of cancer, supervised classification has been applied to identify gene expression biomarkers of different disease states. Clinically validated biomarkers are valuable indicators for diagnosis and guiding therapeutic strategy. We developed an iterative machine learning algorithm to compare the predictive value of biomarker sets chosen by supervised classification against sets selected randomly from known disease-related genes. Both supervised classification and feature selection based on prior knowledge resulted in discriminative classification of molecular phenotypes in breast cancer and lymphoma. Compilation of gene expression data has led to the identification of genes with bimodal, or switch-like, expression patterns. We used unsupervised, supervised and model-based classification methods to investigate the biological relevance of bimodal expression patterns and to evaluate their potential for class discovery and prediction. Both model-based and supervised classification resulted in the accurate classification of samples by tissue phenotype or infectious disease. Functional enrichment analysis indicates switch-like genes are involved in tissue-specific or immune response functions. Taken together, this evidence supports the assertion that bimodal expression patterns are biologically relevant. Clinical relevance of bimodal expression patterns was investigated in an association study of genotypes of families affected by autism. A subset of neural-specific switch-like genes was used to identify candidate gene regions which may contain genetic variants associated with autism risk. A two-stage family-based association test detected an autism susceptibility locus in the q26 region of chromosome 10. The coding region of the fibroblast growth factor receptor 2 (FGFR2) gene is 80 kilobases downstream from the identified locus. Altered expression of FGFR2 may be a contributing genetic factor in development of autism. Identification of the susceptibility locus provides motivation for novel hypotheses concerning the molecular basis of autism. In addition, we provide a method for integration of gene expression and genotype data that may lead to the identification of disease-related polymorphisms in other disorders.Ph.D., Biomedical Engineering -- Drexel University, 200

    Network Analysis of Epidermal Growth Factor Signaling Using Integrated Genomic, Proteomic and Phosphorylation Data

    Get PDF
    To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF) using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response

    The role of roX RNA in dosage compensation during Drosophila melanogaster embryogenesis

    Get PDF
    Dosage compensation (DC) in male Drosophila melanogaster flies is done through hypertranscription of the X chromosome. This involves the dosage compensation complex (DCC), a ribonucleoprotein complex of five protein subunits, Male-specific-lethal 1 (MSL1), MSL2, MSL3, Males-absent-on-the-first (MOF) and Maleless (MLE), and long noncoding RNA, RNA-on-the-X (roX), encoded by either roX1 or roX2 gene. DC is interlinked with the process of sex determination. A hypothesis suggests that upon hybridization of roX1 and roX2 RNAs, a miRNA is produced that is implicated in a feedback mechanism of sex determination. Different approaches were used to reproduce hybridization and validate putative miRNA; however, such observations could not be seen. As differential function of roX RNAs have been proposed, characterization of roX1 and roX2 RNAs in fractionated extracts were done by rt-qPCR. Long isoforms of roX, roX1-RE and roX2-RB, tended to be polyadenylated and enriched in the cytoplasm suggesting differential post-transcriptional processing and possible shuttling mechanism. A preliminary experiment of direct-RNA nanopore sequencing detected major parts of roX RNAs important for DC. With improved protocol of RNA preservation and library preparation, it may prove to be a potent tool to further characterize the lncRNAs and profile its isoforms. Additionally, a detailed study on the establishment of dosage compensation during early embryogenesis was done. MSL2 binding to DNA was evident 4 hours after egg laying when least compensation of X-linked genes is observed. Concurrent detection of MOF on the X chromosome signified assembly of DCC in early development. This complex was active in its function to acetylate H4K16. Nevertheless, accumulation of H4K16ac on the X chromosome proceeded in a time- and space-dependent manner, coinciding with the progression of dosage compensation. Specifically, genes defined as constitutive were closer to DCC binding sites, more acetylated, and first compensated. Meanwhile, genes characterized as developmental were farther from DCC binding sites, lowly acetylated, and slowly compensated.Die Dosiskompensation bei männlichen Drosophila melanogaster Fliegen erfolgt durch Übertranskription des X-Chromosoms. Dies wird durch den Dosiskompensationskomplex (DCC) ermöglicht. Dieser Ribonukleoproteinkomplex besteht aus fünf Proteinuntereinheiten, Male-specific-lethal 1 (MSL1), MSL2, MSL3, Males-absent-on-the-first (MOF) und Maleless (MLE), und einer langen nichtkodierenden RNA, RNA-on-the-X (roX), die entweder durch das roX1- oder das roX2-Gen kodiert wird. Die Dosiskompensation ist mit dem Prozess der Geschlechtsdetermination verbunden. Die Hypothese, dass bei Hybridisierung von roX1- und roX2-RNAs eine miRNA erzeugt wird, die an einem Rückkopplungsmechanismus der Geschlechtsdetermination beteiligt ist, wurde in dieser Arbeit getestet. Leider konnten vorangegangene, hypothesenstützende Beobachtungen nicht reproduziert werden. Für die roX-RNAs wurde eine redundante Rolle in der Dosiskompensation sowie zusätzliche Funktionen außerhalb dieses Prozesses vorgeschlagen. Diese Dissertation umfasst die Charakterisierung von roX1- und roX2-RNAs in fraktionierten Embryoextrakten durch RT-qPCR. Die Ergebnisse deuten auf eine differenzielle posttranskriptionale Verarbeitung der RNAs hin. Lange Isoformen von roX, roX1-RE und roX2-RB sind polyadenyliert. Darüber hinaus sind sie im Zytoplasma angereichert, was auf einen möglichen Austausch mit dem Nucleus hindeutet. Definierende Abschnitte der für DC wichtigen roX-RNAs wurden in einem explorativen Experiment durch direct-RNA nanopore Sequenzieren nachgewiesen. Mit einem verbesserten Protokoll zur Extraktion der RNA, sowie deren Konservation und Bibliotheksvorbereitung könnte es sich als wirksames Instrument zur weiteren Charakterisierung der langen nichtkodierenden RNA und auch in Bezug auf die Selektion der RNA-Isoformen erweisen. Zudem wurde im Rahmen dieser Arbeit eine detaillierte Studie zur Etablierung der Dosiskompensation während der frühen Embryogenese durchgeführt. Die Bindung von MSL2 an DNA war bereits 4 Stunden nach der Eiablage messbar. Zu diesem Zeitpinkt ist nur eine geringe Dosiskompensation von X-Chromosom gekoppelten Genen vorhanden. Gleichzeitig ließ sich ebenfalls MOF bereits in derselben Region auf dem X-Chromosom nachweisen. Dies zeigte die Bildung des DCC in der frühen Embryogenese. Der Komplex war bereits aktiv und acetylierte H4K16. Trotzdem verlief die Akkumulation von H4K16ac auf dem X-Chromosom nachfolgend zeit- und positionsabhängig, übereinstimmend mit dem Fortschreiten der Dosiskompensation. So wurden Gene, die als konstitutiv definiert wurden und näher an DCC-Bindungsstellen lagen, stärker acetyliert und zuerst kompensiert. Als Entwicklungsgene definierte Gene hingegen, lagen weiter von DCC-Bindungsstellen entfernt, wurden nur schwach acetyliert und langsam kompensiert

    A system for genome-wide histone variant dynamics in ES cells reveals dynamic MacroH2A2 replacement at promoters

    Get PDF
    Dynamic exchange of a subset of nucleosomes in vivo plays important roles in epigenetic inheritance of chromatin states, chromatin insulator function, chromosome folding, and the maintenance of the pluripotent state of embryonic stem cells. Here, we extend a pulse-chase strategy for carrying out genome-wide measurements of histone dynamics to several histone variants in murine embryonic stem cells and somatic tissues, recapitulating expected characteristics of the well characterized H3.3 histone variant. We extended this system to the less-studied MacroH2A2 variant, commonly described as a repressive histone variant whose accumulation in chromatin is thought to fix the epigenetic state of differentiated cells. Unexpectedly, we found that while large intergenic blocks of MacroH2A2 were stably associated with the genome, promoter-associated peaks of MacroH2A2 exhibited relatively rapid exchange dynamics in ES cells, particularly at highly-transcribed genes. Upon differentiation to embryonic fibroblasts, MacroH2A2 was gained primarily in additional long, stably associated blocks across gene-poor regions, while overall turnover at promoters was greatly dampened. Our results reveal unanticipated dynamic behavior of the MacroH2A2 variant in pluripotent cells, and provide a resource for future studies of tissue-specific histone dynamics in vivo

    Signal processing techniques for the interpretation of microarray measurements

    Get PDF
    Microarray technology allows the measurement of gene transcription on a genome wide scale. Signal processing approaches to the analysis of data from microarray time course experiments are the focus of this thesis. Firstly, spectral estimation methods are explored as a method for the detection of cell-cyclic elements within microarray data. High resolution data-dependent filterbank methods are proposed as an improvement to the traditional periodogram approach. A spectral estimator is then designed specifically to deal with the errors in the sampling times inherent in microarray experiments, which is based on the robust Capon beamformer. A beamforming inspired approach is shown to yield a more robust, and higher resolution, estimate of the magnitude spectrum of the whole data set than the previous spectral estimation approaches. Blind source separation is examined as a method for recovering sources which represent fundamental cellular processes. The linear mixing model is compared to its transpose form, and a dual form, in terms of their finite sample performance with real microarray data. Second order methods are proposed to recover sources which are spatio-temporally uncorrelated and may be more suitable with microarray data. Both the spectral and blind source separation techniques are shown to yield useful feature extraction measures for microarray data clustering. The spectral feature extraction allows the clustering of cell-cyclic genes into a single functional group. Finally, sparse source separation is introduced as a possible blind separation technique with microarray data
    corecore