204 research outputs found

    Toward Modeling Context-Specific EMT Regulatory Networks Using Temporal Single Cell RNA-Seq Data.

    Get PDF
    Epithelial-mesenchymal transition (EMT) is well established as playing a crucial role in cancer progression and being a potential therapeutic target. To elucidate the gene regulation that drives the decision making of EMT, many previous studies have been conducted to model EMT gene regulatory circuits (GRCs) using interactions from the literature. While this approach can depict the generic regulatory interactions, it falls short of capturing context-specific features. Here, we explore the effectiveness of a combined bioinformatics and mathematical modeling approach to construct context-specific EMT GRCs directly from transcriptomics data. Using time-series single cell RNA-sequencing data from four different cancer cell lines treated with three EMT-inducing signals, we identify context-specific activity dynamics of common EMT transcription factors. In particular, we observe distinct paths during the forward and backward transitions, as is evident from the dynamics of major regulators such as NF-KB (e.g., NFKB2 and RELB) and AP-1 (e.g., FOSL1 and JUNB). For each experimental condition, we systematically sample a large set of network models and identify the optimal GRC capturing context-specific EMT states using a mathematical modeling method named Random Circuit Perturbation (RACIPE). The results demonstrate that the approach can build high quality GRCs in certain cases, but not others and, meanwhile, elucidate the role of common bioinformatics parameters and properties of network structures in determining the quality of GRCs. We expect the integration of top-down bioinformatics and bottom-up systems biology modeling to be a powerful and generally applicable approach to elucidate gene regulatory mechanisms of cellular state transitions

    The continuum of Drosophila embryonic development at single-cell resolution.

    Get PDF
    Drosophila melanogaster is a powerful, long-standing model for metazoan development and gene regulation. We profiled chromatin accessibility in almost 1 million and gene expression in half a million nuclei from overlapping windows spanning the entirety of embryogenesis. Leveraging developmental asynchronicity within embryo collections, we applied deep neural networks to infer the age of each nucleus, resulting in continuous, multimodal views of molecular and cellular transitions in absolute time. We identify cell lineages; infer their developmental relationships; and link dynamic changes in enhancer usage, transcription factor (TF) expression, and the accessibility of TFs cognate motifs. With these data, the dynamics of enhancer usage and gene expression can be explored within and across lineages at the scale of minutes, including for precise transitions like zygotic genome activation

    Computational approaches for single-cell omics and multi-omics data

    Get PDF
    Single-cell omics and multi-omics technologies have enabled the study of cellular heterogeneity with unprecedented resolution and the discovery of new cell types. The core of identifying heterogeneous cell types, both existing and novel ones, relies on efficient computational approaches, including especially cluster analysis. Additionally, gene regulatory network analysis and various integrative approaches are needed to combine data across studies and different multi-omics layers. This thesis comprehensively compared Bayesian clustering models for single-cell RNAsequencing (scRNA-seq) data and selected integrative approaches were used to study the cell-type specific gene regulation of uterus. Additionally, single-cell multi-omics data integration approaches for cell heterogeneity analysis were investigated. Article I investigated analytical approaches for cluster analysis in scRNA-seq data, particularly, latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) models. The comparison of LDA and HDP together with the existing state-of-art methods revealed that topic modeling-based models can be useful in scRNA-seq cluster analysis. Evaluation of the cluster qualities for LDA and HDP with intrinsic and extrinsic cluster quality metrics indicated that the clustering performance of these methods is dataset dependent. Article II and Article III focused on cell-type specific integrative analysis of uterine or decidual stromal (dS) and natural killer (dNK) cells that are important for successful pregnancy. Article II integrated the existing preeclampsia RNA-seq studies of the decidua together with recent scRNA-seq datasets in order to investigate cell-type-specific contributions of early onset preeclampsia (EOP) and late onset preeclampsia (LOP). It was discovered that the dS marker genes were enriched for LOP downregulated genes and the dNK marker genes were enriched for upregulated EOP genes. Article III presented a gene regulatory network analysis for the subpopulations of dS and dNK cells. This study identified novel subpopulation specific transcription factors that promote decidualization of stromal cells and dNK mediated maternal immunotolerance. In Article IV, different strategies and methodological frameworks for data integration in single-cell multi-omics data analysis were reviewed in detail. Data integration methods were grouped into early, late and intermediate data integration strategies. The specific stage and order of data integration can have substantial effect on the results of the integrative analysis. The central details of the approaches were presented, and potential future directions were discussed.  Laskennallisia menetelmiä yksisolusekvensointi- ja multiomiikkatulosten analyyseihin Yksisolusekvensointitekniikat mahdollistavat solujen heterogeenisyyden tutkimuksen ennennäkemättömällä resoluutiolla ja uusien solutyyppien löytämisen. Solutyyppien tunnistamisessa keskeisessä roolissa on ryhmittely eli klusterointianalyysi. Myös geenien säätelyverkostojen sekä eri molekyylidatatasojen yhdistäminen on keskeistä analyysissä. Väitöskirjassa verrataan bayesilaisia klusterointimenetelmiä ja yhdistetään eri menetelmillä kerättyjä tietoja kohdun solutyyppispesifisessä geeninsäätelyanalyysissä. Lisäksi yksisolutiedon integraatiomenetelmiä selvitetään kattavasti. Julkaisu I keskittyy analyyttisten menetelmien, erityisesti latenttiin Dirichletallokaatioon (LDA) ja hierarkkiseen Dirichlet-prosessiin (HDP) perustuvien mallien tutkimiseen yksisoludatan klusterianalyysissä. Kattava vertailu näiden kahden mallin sekä olemassa olevien menetelmien kanssa paljasti, että aihemallinnuspohjaiset menetelmät voivat olla hyödyllisiä yksisoludatan klusterianalyysissä. Menetelmien suorituskyky riippui myös kunkin analysoitavan datasetin ominaisuuksista. Julkaisuissa II ja III keskitytään naisen lisääntymisterveydelle tärkeiden kohdun stroomasolujen ja NK-immuunisolujen solutyyppispesifiseen analyysiin. Artikkelissa II yhdistettiin olemassa olevia tuloksia pre-eklampsiasta viimeisimpiin yksisolusekvensointituloksiin ja löydettiin varhain alkavan pre-eklampsian (EOP) ja myöhään alkavan pre-eklampsian (LOP) solutyyppispesifisiä vaikutuksia. Havaittiin, että erilaistuneen strooman markkerigeenien ilmentyminen vähentyi LOP:ssa ja NK-markkerigeenien ilmentyminen lisääntyi EOP:ssa. Julkaisu III analysoi strooman ja NK-solujen alapopulaatiospesifisiä geeninsäätelyverkostoja ja niiden transkriptiofaktoreita. Tutkimus tunnisti uusia alapopulaatiospesifisiä säätelijöitä, jotka edistävät strooman erilaistumista ja NK-soluvälitteistä immunotoleranssia Julkaisu IV tarkastelee yksityiskohtaisesti strategioita ja menetelmiä erilaisten yksisoludatatasojen (multi-omiikka) integroimiseksi. Integrointimenetelmät ryhmiteltiin varhaisen, myöhäisen ja välivaiheen strategioihin ja kunkin lähestymistavan menetelmiä esiteltiin tarkemmin. Lisäksi keskusteltiin mahdollisista tulevaisuuden suunnista

    The Architecture And Dynamics Of Gene Regulatory Networks Directing Cell-Fate Choice During Murine Hematopoiesis

    Get PDF
    Mammals produce hundreds of billions of new blood cells every day througha process known as hematopoiesis. Hematopoiesis starts with stem cells that develop into all the different types of cells found in blood by changing their genome-wide gene expression. The remodeling of genome-wide gene expression can be primarily attributed to a special class of proteins called transcription factors (TFs) that can activate or repress other genes, including genes encoding TFs. TFs and their targets therefore form recurrent networks called gene regulatory networks (GRNs). GRNs are crucial during physiological developmental processes, such as hematopoiesis, while abnormalities in the regulatory interactions of GRNs can be detrimental to the organisms. To this day we do not know all the key compo-nents that comprise hematopoietic GRNs or the complete set of their regulatory interactions. Inference of GRNs directly from genetic experiments is low throughput and labor intensive, while computational inference of comprehensive GRNs is challenging due to high processing times. This dissertation focuses on deriving the architecture and the dynamics of hematopoietic GRNs from genome-wide gene expression data obtained from high-resolution time-series experiments. The dissertation also aims to address the technical challenge of speeding up the process of GRN inference. Here GRNs are inferred and modeled using gene circuits, a data-driven method based on Ordinary Differential Equations (ODEs). In gene circuits, the rate of change of a gene product depends on regulatory influences from other genes encoded as a set of parameters that are inferred from time-series data. A twelve-gene GRN comprising genes encoding key TFs and cytokine receptors involved in erythrocyte-neutrophil differentiation was inferred from a high-resolution time-series dataset of the in vitro differentiation of a multipotential cell line. The inferred GRN architecture agreed with prior empirical evidence and pre- dicted novel regulatory interactions. The inferred GRN model was also able to predict the outcome of perturbation experiments, suggesting an accurate inference of GRN architecture. The dynamics of the inferred GRN suggested an alternative explanation to the currently accepted sequence of regulatory events during neutrophil differentiation. The analysis of the model implied that two TFs, C/EBPα and Gfi1, initiate cell-fate choice in the neutrophil lineage, while PU.1, believed to be a master regulator of all white-blood cells, is activated only later. This inference was confirmed in a single-cell RNA-Seq dataset from mouse bone marrow, in which PU.1 upregulation was preceded by C/EBPα and Gfi1 upregulation. This dissertation also presents an analysis of a high-temporal resolution genome-wide gene expression dataset of in vitro macrophage-neutrophil differentiation. Analysis of these data reveal that genome-wide gene expression during differentiation is highly dynamic and complex. A large-scale transition is observed around 8h and shown to be related to wide-spread physiological remodeling of the cells. The genes associated by myeloid differentiation mainly change during the first 4 hours, implying that the cell-fate decision takes place in the first four hours of differentiation. The dissertation also presents a new classification-based model-training technique that addresses the challenge of the high computational cost of inferring GRNs. This method, called Fast Inference of Gene Regulation (FIGR), is demonstrated to be two orders magnitude faster than global non-linear optimization techniques and its computational complexity scales much better with GRN size. This work has demonstrated the feasibility of simulating relatively large realistic GRNs using a dynamical and mechanistically accurate model coupled to high-resolution time series data and that such models can yield novel biological insight. Taken together with the macrophage-neutrophil dataset and the computationally efficient GRN inference methodology, this work should open up new avenues for modeling more comprehensive GRNs in hematopoiesis and the broader field of developmental biology

    Modelling gene expression in terms of DNA sequence

    Get PDF
    Understanding the gene regulatory networks that control gene expression remains one of the most of important questions in molecular biology. Much of gene expression is controlled through transcription initiation, whose regulation is ultimately encoded in the constellations of small sequence motifs in the DNA that are bound by transcription factors (TFs) in a sequence-specific manner. In this thesis, we addressed the task of understanding gene regulation on two levels. Firstly, we present a computational pipeline for inferring a set of gene regulatory elements in a given organism which includes identifying genes that encode DNA-binding domains (DBDs), mapping them to known binding motifs by leveraging similarity in DBDs between species, annotating promoter regions genome-wide, aligning promoters with orthologous regions from related genomes, and predicting genome-wide transcription factor binding sites (TFBSs). We demonstrated the use of our pipeline by applying it to zebrafish. Furthermore, we integrated these results into our previously developed Integrated System for Motif Activity Response Analysis (ISMARA) which models gene expression data in terms of predicted regulatory sites. Using ISMARA, we predicted known and novel key regulatory TFs in zebrafish using a number of RNA-seq datasets. Secondly, we zoom in at the scale of one single TF regulating a set of constitutive promoters in \textit{Escherichia coli}. We analyzed an artificially evolved set of synthetic promoter sequences which are selected for expression constitutive promoters regulated by σ70\sigma^{70} transcription factor. We looked closely into promoter sequences and TF binding dynamics and investigated the predictive power of TF binding affinity on gene expression

    SCTIGER: A DEEP-LEARNING METHOD FOR INFERRING GENE REGULATORY NETWORKS FROM SINGLE-CELL GENE EXPRESSION DATA

    Get PDF
    Inferring gene regulatory networks (GRNs) from single-cell RNA-sequencing (scRNA-seq) data is an important computational question to reveal fundamental regulatory mechanisms. Although many computational methods have been designed to predict GRNs, none work on condition specific GRNs by directly using paired datasets of case versus control experiments, common in diverse biological research projects. We present a novel deep-learning based method, scTIGER, for GRN detection by using the co-dynamics of gene expression. scTIGER also employs cell type-based pseudotiming, an attention-based convolutional neural network method, and permutation-based significance testing to infer GRNs from gene modules. We first applied scTIGER to scRNA-seq datasets of prostate cancer cells and detected potential AR-mediated GRNs. Then, when applied to mouse neurons with and without fear memory and detected CREB-mediated GRNs. The results show scTIGER can be applied to general case-versus-control scRNA-seq datasets with high performance

    Context matters:the power of single-cell analyses in identifying context-dependent effects on gene expression in blood immune cells

    Get PDF
    The human immune system is a complex system that we still do not fully understand. No two humans react in the same way to attacks by bacteria, viruses or fungi. Factors such as genetics, the type of pathogen or previous exposure to the pathogen may explain this diversity in response. Single-cell RNA sequencing (scRNA-seq) is a new technique that enables us to study the gene expression of each cell individually, allowing us to study immune diversity in much greater detail. This increased resolution helps us discern how disease-associated genetic variants actually contribute to disease. In this thesis, I studied the relation between disease-associated genetic variants and gene expression levels in the context of different cell types and pathogen exposures in order to gain insight into the working mechanisms of these variants. For many variants we learnt in which cell types and under which pathogen exposures they affect gene expression, and we were even able to identify changes in gene co-expression, suggesting that disease-associated variants change how our genes interact with each other. With the single-cell field being so new, much of my work was showing the feasibility of using scRNA-seq to study the interplay between genetics and gene expression. To set up future research, we created guidelines for these analyses and established a consortium that brings together many major scientists in the field to enable large-scale studies across an even wider variety of contexts. This final work helps inform current and future large-scale scRNA-seq research

    Non-coding genome contributions to the development and evolution of mammalian organs

    Get PDF
    Protein-coding sequences only cover 1-2% of a typical mammalian genome. The remaining non-coding space hides thousands of genomic elements, some of which act via their DNA sequence while others are transcribed into non-coding RNAs. Many well-characterized non-coding elements are involved in the regulation of other genes, a process essential for the emergence of different cell types and organs during development. Changes in the expression of conserved genes during development are in turn thought to facilitate evolutionary innovation in form and function. Thus, non-coding genomic elements are hypothesized to play important roles in developmental and evolutionary processes. However, challenges related to the identification and characterization of these elements, in particular in non-model organisms, has limited the study of their overall contributions to mammalian organ development and evolution. During my dissertation work, I addressed this gap by studying two major classes of non-coding elements, long non-coding RNAs (lncRNAs) and cis-regulatory elements (CREs). In the first part of my thesis, I analyzed the expression profiles of lncRNAs during the development of seven major organs in six mammals and a bird. I showed that, unlike protein-coding genes, only a small fraction of lncRNAs is expressed in reproducibly dynamic patterns during organ development. These lncRNAs are enriched for a series of features associated with functional relevance, including increased evolutionary conservation and regulatory complexity, highlighting them as candidates for further molecular characterization. I then associated these lncRNAs with specific genes and functions based on their spatiotemporal expression profiles. My analyses also revealed differences in lncRNA contributions across organs and developmental stages, identifying a developmental transition from broadly expressed and conserved lncRNAs towards an increasing number of lineage- and organ-specific lncRNAs. Following up on these global analyses, I then focused on a newly-identified lncRNA in the marsupial opossum, Female Specific on chromosome X (FSX). The broad and likely autonomous female-specific expression of FSX suggests a role in marsupial X-chromosome inactivation (XCI). I showed that FSX shares many expression and sequence features with another lncRNA, RSX — a known regulator of XCI in marsupials. Comparisons to other marsupials revealed that both RSX and FSX emerged in the common marsupial ancestor and have since been preserved in marsupial genomes, while their broad and female-specific expression has been retained for at least 76 million years of evolution. Taken together, my analyses highlighted FSX as a novel candidate for regulating marsupial XCI. In the third part of this work, I shifted my focus to CREs and their cell type-specific activities in the developing mouse cerebellum. After annotating cerebellar cell types and states based on single-cell chromatin accessibility data, I identified putative CREs and characterized their spatiotemporal activity across cell types and developmental stages. Focusing on progenitor cells, I described temporal changes in CRE activity that are shared between early germinal zones, supporting a model of cell fate induction through common developmental cues. By examining chromatin accessibility dynamics during neuronal differentiation, I revealed a gradual divergence in the regulatory programs of major cerebellar neuron types. In the final part, I explored the evolutionary histories of CREs and their potential contributions to gene expression changes between species. By comparing mouse CREs to vertebrate genomes and chromatin accessibility profiles from the marsupial opossum, I identified a temporal decrease in CRE conservation, which is shared across cerebellar cell types. However, I also found differences in constraint between cell types, with microglia having the fastest evolving CREs in the mouse cerebellum. Finally, I used deep learning models to study the regulatory grammar of cerebellar cell types in human and mouse, showing that the sequence rules determining CRE activity are conserved across mammals. I then used these models to retrace the evolutionary changes leading to divergent CRE activity between species. Collectively, my PhD work provides insights into the evolutionary dynamics of non-coding genes and regulatory elements, the processes associated with their conservation, and their contributions to the development and evolution of mammalian cell types and organs
    corecore