71 research outputs found

    A descriptive marker gene approach to single-cell pseudotime inference

    Get PDF
    MotivationPseudotime estimation from single-cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour.ResultsHere we introduce an orthogonal Bayesian approach termed ‘Ouija’ that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify ‘metastable’ states—discrete cell types along the continuous trajectories—that recapitulate known cell types.Availability and implementationAn open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow.Supplementary informationSupplementary data are available at Bioinformatics online.</p

    Developmental scRNAseq Trajectories in Gene- and Cell-State Space—The Flatworm Example

    Get PDF
    Single-cell RNA sequencing has become a standard technique to characterize tissue development. Hereby, cross-sectional snapshots of the diversity of cell transcriptomes were transformed into (pseudo-) longitudinal trajectories of cell differentiation using computational methods, which are based on similarity measures distinguishing cell phenotypes. Cell development is driven by alterations of transcriptional programs e.g., by differentiation from stem cells into various tissues or by adapting to micro-environmental requirements. We here complement developmental trajectories in cell-state space by trajectories in gene-state space to more clearly address this latter aspect. Such trajectories can be generated using self-organizing maps machine learning. The method transforms multidimensional gene expression patterns into two dimensional data landscapes, which resemble the metaphoric Waddington epigenetic landscape. Trajectories in this landscape visualize transcriptional programs passed by cells along their developmental paths from stem cells to differentiated tissues. In addition, we generated developmental “vector fields” using RNA-velocities to forecast changes of RNA abundance in the expression landscapes. We applied the method to tissue development of planarian as an illustrative example. Gene-state space trajectories complement our data portrayal approach by (pseudo-)temporal information about changing transcriptional programs of the cells. Future applications can be seen in the fields of tissue and cell differentiation, ageing and tumor progression and also, using other data types such as genome, methylome, and also clinical and epidemiological phenotype data

    A comprehensive single-cell map of T cell exhaustion-associated immune environments in human breast cancer

    Full text link
    Immune checkpoint therapy in breast cancer remains restricted to triple negative patients, and long-term clinical benefit is rare. The primary aim of immune checkpoint blockade is to prevent or reverse exhausted T cell states, but T cell exhaustion in breast tumors is not well understood. Here, we use single-cell transcriptomics combined with imaging mass cytometry to systematically study immune environments of human breast tumors that either do or do not contain exhausted T cells, with a focus on luminal subtypes. We find that the presence of a PD-1high exhaustion-like T cell phenotype is associated with an inflammatory immune environment with a characteristic cytotoxic profile, increased myeloid cell activation, evidence for elevated immunomodulatory, chemotactic, and cytokine signaling, and accumulation of natural killer T cells. Tumors harboring exhausted-like T cells show increased expression of MHC-I on tumor cells and of CXCL13 on T cells, as well as altered spatial organization with more immature rather than mature tertiary lymphoid structures. Our data reveal fundamental differences between immune environments with and without exhausted T cells within luminal breast cancer, and show that expression of PD-1 and CXCL13 on T cells, and MHC-I - but not PD-L1 - on tumor cells are strong distinguishing features between these environments

    Dissecting regional heterogeneity and modeling transcriptional cascades in brain organoids

    Get PDF
    Over the past decade, there has been a rapid expansion in the development and utilization of brain organoid models, enabling three-dimensional in vivo-like views of fundamental neurodevelopmental features of corticogenesis in health and disease. Nonetheless, the methods used for generating cortical organoid fates exhibit widespread heterogeneity across different cell lines. Here, we show that a combination of dual SMAD and WNT inhibition (Triple-i protocol) establishes a robust cortical identity in brain organoids, while other widely used derivation protocols are inconsistent with respect to regional specification. In order to measure this heterogeneity, we employ single-cell RNA-sequencing (scRNA-Seq), enabling the sampling of the gene expression profiles of thousands of cells in an individual sample. However, in order to draw meaningful conclusions from scRNA-Seq data, technical artifacts must be identified and removed. In this thesis, we present a method to detect one such artifact, empty droplets that do not contain a cell and consist mainly of free-floating mRNA in the sample. Furthermore, from their expression profiles, cells can be ordered along a developmental trajectory which recapitulates the progression of cells as they differentiate. Based on this ordering, we model gene expression using a Bayesian inference approach in order to measure transcriptional dynamics within differentiating cells. This enables the ordering of genes along transcriptional cascades, statistical testing for differences in gene expression changes, and measuring potential regulatory gene interactions. We apply this approach to differentiating cortical neural stem cells into cortical neurons via an intermediate progenitor cell type in brain organoids to provide a detailed characterization of the endogenous molecular processes underlying neurogenesis.Im letzten Jahrzent hat die Entwicklung und Nutzung von Organoidmodellen des Gehirns stark zugenommen. Diese Modelle erlauben dreidimensionale, in-vivo ähnliche Einblicke in fundamentale Aspekte der neurologischen Entwicklung des Hirnkortex in Gesundheit und Krankheit. Jedoch weisen die Methoden, um die Entwicklung kortikaler Organoide zu verfolgen, starke Heterogenität zwischen verschiedenen Zelllinien auf. Hier weisen wir nach, dass eine Kombination dualer SMAD und WNT Hemmung (Triple-i Protokoll) eine konstante kortikale Zuordnung in Hirnorganoiden erzeugt, während andere, weit verbreitete und genutzte Protokolle in Bezug auf kortikale Spezifizierung keine konstanten Ergebnisse liefern. Um die Heterogenität zu messen, haben wir Einzelzell-RNA Sequenzierung (scRNA-Seq) benutzt, wodurch die Erfassung der Genexpression von Tausenden von Zellen in einer Probe möglich ist. Um jedoch sinnvolle Schlüsse aus diesen scRNA-Seq Daten zu ziehen, müssen technische Artifakte identifiziert und aus den Daten entfernt werden. In dieser Dissertation stellen wir eine Methode vor, um eines solcher Artifakte zu erkennen: leere Tröpfchen (ohne Zellen), die hauptsächlich aus freischwebender mRNA in der Probe bestehen. Weiterhin können Zellen anhand ihrer Genexpressionsprofile entlang einer Entwicklungsschiene angeordnet werden, die die Entwicklung der Zellen während ihrer Differenzierung rekapituliert. Auf der Grundlage dieser Entwicklungsreihenfolge modellieren wir die Genexpression mit einem Bayes’schen Inferenzansatz, um die Dynamik der Transkription in sich differenzierenden Zellen zu messen. Dies ermöglicht das Anordnen von Genen entlang einer Transkriptionskaskade, sowie statistische Untersuchungen in Hinblick auf Unterschiede in der Veränderung von Genexpression, und das Messen des Einflusses möglicher Regulationsgene. Wir wenden diese Methode an, um kortikale neuronale Stammzellen zu untersuchen, die sich über einen intermediären Vorläuferzelltyp in kortikale Neuronen in Hirnorganoiden differenzieren, und um eine detaillierte Charakterisierung der molekularen Prozesse zu liefern, die der Neurogenese zugrunde liegen

    Unraveling disease mechanisms of different lung pathologies with single-cell RNA sequencing

    Get PDF
    The respiratory system is composed of different tissues with their respective cell types that together work in concert to perform air conductance and gas exchange. With the advent of single-cell RNA-sequencing (scRNA-seq), it is now possible to comprehensively interrogate the function of each individual cell in homeostatic and diseased states. In this dissertation, various roles of epithelial, mesenchymal, and immune cell types of the respiratory system in idiopathic pulmonary fibrosis (IPF) and corona virus disease 2019 (COVID-19) were investigated with scRNA-seq. IPF is a chronic interstitial lung disease characterized by the progressive scarring of the lung parenchyma. Previous studies that surveyed the cellular landscape of IPF lungs utilized explant lungs that reflect end-stage fibrosis. To uncover disease mechanisms of airway cell types in early-stage fibrosis, air-liquid interface (ALI) cultures of primary cells taken from newly diagnosed IPF patients were used. This identified proinflammatory epithelial cells, profibrotic basal cells, and primed fibroblasts as early-stage drivers of IPF. Treatment with antifibrotic compounds nintedanib, pirfenidone, and saracatinib fail to completely ameliorate the identified signatures. With the emergence of the COVID-19 pandemic and its extensive public health burden, it was imperative to understand the molecular mechanisms of viral entry and disease pathology to identify potential risk factors and therapeutic targets. In the early stages of the pandemic, viral entry factors ACE2, TMPRSS2, and FURIN were found to be expressed by a transient secretory cell type (differentiating from secretory to ciliated cell) of the airway mucosa and by alveolar type 2 cells of the alveolar epithelium. With further investigation of severe COVID-19, the early-stage of COVID-19 infection characterized itself with a hyperactivated immune response mediated by proinflammatory macrophages. On the other hand, late-stage COVID-19, especially those with acute respiratory distress syndrome (ARDS), was characterized by an accumulation of profibrotic macrophages and activated myofibroblasts that drove pulmonary scarring and fibrosis. Although IPF and COVID-19 are different diseases by their own right, they share a commonality in aberrant wound healing responses. Both diseases are characterized by tissue inflammation that is followed by a profibrotic phase. Unlike in IPF where the tissue remodeling is progressive and chronic, COVID-19 ARDS-associated fibrosis undergoes a resolution phase. Future studies comparing the cellular and transcriptional landscape of both conditions in early and late stages of disease will uncover pathogenic mechanisms and therapeutic targets of lung fibrosis. The application of high-resolution transcriptomic profiling techniques such as scRNA-seq permits the interrogation of individual cell types and their direct contribution to the development of diseases. Moreover, it allows the comparison and transfer of identified pathomechanisms across different pulmonary diseases and, in doing so, provides deeper and generalizable insights. As this field continues to evolve, it will undoubtedly continue to provide a deeper understanding of respiratory diseases.Das respiratorische System setzt sich aus verschiedenen Geweben und ihren zugrundeliegenden Zelltypen zusammen, die gemeinsam Luftaufnahme und Gasaustausch gewährleisten. Mit dem Aufkommen der Einzelzell-RNA-Sequenzierung (scRNA-seq) ist es nun möglich, die Funktion jeder einzelnen Zelle in homöostatischen und kranken Zuständen umfassend zu untersuchen. In dieser Dissertation wurden verschiedene Rollen von Epithel-, Mesenchymal- und Immunzelltypen des Atmungssystems bei idiopathischer Lungenfibrose (IPF) und der Coronavirus-Krankheit-2019 (COVID-19) mit scRNA-seq untersucht. IPF ist eine chronische interstitielle Lungenerkrankung, die durch eine fortschreitende Vernarbung des Lungenparenchyms gekennzeichnet ist. Frühere Studien, die die Zellkomposition von IPF-Lungen untersuchten, verwendeten Lungenexplantate, die das Endstadium der Fibrose widerspiegeln. Um Krankheitsmechanismen von Atemwegszelltypen im Frühstadium der Fibrose aufzudecken, wurden Air-Liquid-Interface (ALI)-Kulturen von primären Zellen verwendet, die frisch diagnostizierten IPF-Patienten entnommen wurden. Dabei wurden proinflammatorische Epithelzellen, profibrotische Basalzellen und aktivierte Fibroblasten als treibende Kräfte im Frühstadium der IPF identifiziert. Die Behandlung mit den antifibrotischen Wirkstoffen Nintedanib, Pirfenidon und Saracatinib führte nicht zu einer vollständigen Verbesserung der identifizierten Signaturen. Mit dem Beginn der COVID-19-Pandemie und ihrer großen Belastung für die öffentliche Gesundheit war es unerlässlich, die molekularen Mechanismen des Viruseintritts und der Krankheitspathologie zu verstehen, um potenzielle Risikofaktoren und therapeutische Ansätze zu identifizieren. In den frühen Stadien der Pandemie wurde festgestellt, dass die viralen Eintrittsfaktoren ACE2, TMPRSS2 und FURIN von einem vorübergehenden sekretorischen Zelltyp (der sich von sekretorischen zu ziliierten Zellen differenziert) der Atemwegsschleimhaut und von Typ-2 -Pneumozyten des Alveolarepithels exprimiert werden. Bei der weiteren Untersuchung von schweren COVID-19 Verläufen zeigte sich, dass das Frühstadium der COVID-19-Infektion durch eine hyperaktivierte Immunantwort charakterisiert ist, die durch proinflammatorische Makrophagen vermittelt wird. Andererseits war das Spätstadium der COVID-19-Infektion, insbesondere bei Patienten mit akutem Atemnotsyndrom (ARDS), durch eine Anhäufung von profibrotischen Makrophagen und aktivierten Myofibroblasten gekennzeichnet, die die pulmonale Narbenbildung und Fibrose vorantrieben. Obwohl es sich bei IPF und COVID-19 um unterschiedliche Krankheiten handelt, ähneln sie sich in ihrer gestörten Wundheilung. Beide Krankheiten sindS durch eine Gewebeentzündung gekennzeichnet, auf die eine profibrotische Phase folgt. Im Gegensatz zur IPF, bei der die Gewebeveränderung fortschreitend und chronisch ist, durchläuft die COVID-19 ARDS-assoziierte Fibrose eine Reparationsphase. Zukünftige Studien, die die zelluläre und transkriptionelle Landschaft beider Erkrankungen in frühen und späten Stadien vergleichen, werden pathogene Mechanismen und therapeutische Ansätze der Lungenfibrose aufdecken können. Die Anwendung hochauflösender transkriptomischer Sequenzierung wie scRNA-seq ermöglicht die Untersuchung einzelner Zelltypen und ihren Beitrag zur Entstehung von Krankheiten. Darüber hinaus ermöglicht sie den Vergleich und die Übertragbarkeit identifizierter Pathomechanismen über verschiedene Lungenkrankheiten hinweg und liefert so tiefere und generalisierbare Erkenntnisse. Da sich dieses Feld stetig weiter entwickelt, wird es zweifellos auch weiterhin zu einem tieferen Verständnis von Atemwegserkrankungen beitragen

    Bayesian statistical learning for big data biology

    Get PDF
    Bayesian statistical learning provides a coherent probabilistic framework for modelling uncertainty in systems. This review describes the theoretical foundations underlying Bayesian statistics and outlines the computational frameworks for implementing Bayesian inference in practice. We then describe the use of Bayesian learning in single-cell biology for the analysis of high-dimensional, large data sets

    Transient and heterogeneous YAP1 activity drives self-organization in intestinal organoid development

    Get PDF
    Recent years have seen an explosion in the ability to grow organoids which phenocopy diverse organs ranging from intestinal epithelium to complex cerebral structures. All organoid models emerge from the potential of individual cells to self-organize into higher order structures under homogenous conditions. They can be established by extracting adult stem cells from healthy or diseased tissue or by directed differentiation of pluripotent stem cells. Protocols have been established to culture them in well-defined conditions and use them for any standard biological or molecular technology. In addition, they are more amenable to imaging approaches, allowing researchers to gain access to early development processes. Despite the exciting promises of organoid technologies and the hope that they will result in new human therapies, little is known about self-organization into complex organ like structures. This type of basic knowledge about the underlying process is required for applied breakthrough to occur. In this work, we used the enormous regenerative capacity of the small intestine to study how cells with stem and non-stem cell identity self-organize into organoids. A quantitative study identified a YAP1 driven transient dedifferentiation, occurring independently of the starting population, into proliferative, homogenous cysts able to reconstitutes all cell types of the mature tissue. In contrast to the prevalent view of organoid development, this intermediate state exhibits not intestinal stem cell but fetal-like characteristics. By addressing how asymmetries emerge within homogenous cysts to specify Paneth cells, the first symmetry breaking event in this system, we identified large degrees of cell-to-cell variability in YAP1 activity preceding symmetry breaking. This YAP1 cell-to-cell variability in its subcellular localization is essential to drive a Notch-Delta lateral inhibition event that specifies Paneth cell fate. In conclusion, this works shows how combining live and 4i multiplexed imaging, sequencing and perturbation approaches can bridge decision making at the single cell level, by lateral-inhibition driven cell-fate decision, to different phenotypic outcomes on the tissue level, the occurrence of budding organoids or because of failed symmetry breaking, enterocsyts. This study gives a first glance into the complex interaction networks endowing individual cells with the capacity to self-organize into organoids

    ClockOME: searching for oscillatory genes in early vertebrate development

    Get PDF
    Embryo development is a dynamic process regulated in space and time. Cells must integrate biochemical and mechanical signals to generate fully functional organisms, where oscillatory gene expression plays a key role. The embryo molecular clock (EMC) is the best known genetic oscillator active in embryo segmentation, involving genes from the Notch, FGF, and WNT pathways. However, the list of cyclic genes is still incomplete mostly due to the challenges involved with studying periodic systems. Recently, such studies have become more feasible with the development of pseudo-time ordering algorithms that search for candidate oscillatory genes using large transcriptomics datasets sampled without explicit time measurements. This study aims at finding candidate oscillatory genes - ClockOME - active in early chick embryo development. Two Gallus gallus microarray transcriptomics datasets from Presomitic mesoderm (PSM), and one dataset from limb segmentation were gathered from GEO and ArrayExpress. To normalize these data from different experiments, an RData package - FrozenChicken - was developed to apply a frozen Robust MultiArray (fRMA) normalization to the data. Next the datasets were processed with Oscope (a pseudo-time ordering algorithm) to search for candidate periodic genes clustered by similar oscillatory behaviour. The clusters of predicted oscillators were then subject to functional enrichment and interaction network analyses to highlight the biological functions associated with these genes. Oscope predicted three clusters of oscillators: two in PSM (106 and 32 genes), and one in Limb (162 genes). Overall, the genes are associated with regulatory, morphological, and developmental processes. Mesp2, a gene involved with the EMC, was found in this dataset, validating the approach, however, the majority of genes are novel oscillatory candidates, associated with chromatin and transcriptional regulation, as well as protein and oxygen metabolism. The list of candidate oscillators represents a valuable resource for guided experimental validation to discover additional members of the chick EMC. Six genes have been proposed for high-priority experimental validation: SRC, PTCH1, NOTCH2, YAP1, KDR, CTR9.O desenvolvimento embrionário é um processo dinâmico que envolve alterações moleculares no espaço e no tempo. As células embrionárias são constantemente expostas a estímulos bioquímicos e mecânicos, e respondem ao ambiente em que se encontram alterando o seu programa genético. Quando corretamente integradas, estas respostas celulares culminam com o desenvolvimento bem-sucedido de um organismo funcional. Assim, a embriogénese envolve processos moleculares estritamente regulados, sendo a expressão oscilatória de genes uma das formas possíveis para a regulação do comportamento das células ao longo do tempo. O relógio molecular embrionário é um conhecido oscilador genético, e está envolvido na segmentação do tecido paraxial embrionário. O conceito de relógio molecular foi inicialmente proposto em 1976 por Cooke e Zeeman, ao qual chamaram o modelo Clock and Wavefront (Relógio e Frente de Onda)1. Este modelo foi concebido para descrever teoricamente a formação rítmica de sómitos em ambos os lados da mesoderme paraxial (PSM) nos vertebrados, e baseia-se na existência de osciladores genéticos que regulam esse processo de segmentação da PSM ao longo do tempo. Para além do relógio, como diz o nome, o modelo inclui a existência de uma frente de onda, que determina espacialmente o comportamento das células presentes na mesoderme pré-somítica (PSM). Assim, os dois mecanismos guiam a diferenciação das células da PSM, que consequentemente sofrem transformações genéticas que precedem a formação dos sómitos. A base deste relógio molecular consiste na expressão periódica de genes que fazem parte das vias moleculares Notch, FGF e WNT. Contudo, a lista de genes envolvidos no relógio embrionário ainda não se encontra completa, facto este que se deve principalmente às dificuldades experimentais relacionadas com o estudo de sistemas periódicos quando não se conhece de antemão a periodicidade/ritmo da expressão dos genes envolvidos. Com o advento de novas técnicas de transcriptómica que permitem o estudo dos valores de expressão de todos os genes simultaneamente, nomeadamente usando Microarrays, ou mais recentemente através de métodos de sequenciação, como RNA-sequencing ou Single-Cell RNA-sequencing, surge a oportunidade de procurar alargar a lista de genes com expressão oscilatória. Porém, estes métodos implicam a extração do RNA das células amostradas resultando na morte celular. Assim, este processamento inviabiliza o estudo das mesmas células ao longo do tempo, originando dados moleculares estáticos, isto é, os níveis de expressão obtidos representam uma única amostra temporal. Para o estudo de processos periódicos, seria então necessário fazer uma série temporal amostrando diferentes indivíduos ao longo do tempo de desenvolvimento, aumentando grandemente o número de amostras biológicas necessárias para resolver o ciclo de oscilação para cada gene estudado. Assim, sem informação temporal medida explicitamente, a expressão oscilatória de genes pode apenas ser estudada usando modelos matemáticos apropriados, nomeadamente através da aplicação de algoritmos de ordenação pseudo-temporal. Estes métodos ordenam as amostras ao longo do tempo de uma oscilação de forma a obter o padrão do comportamento cíclico para todos os genes cuja expressão oscila concomitantemente. Torna-se assim possível, bioinformaticamente, inferir o potencial oscilatório de genes medidos por estas técnicas de transcriptómica, sem informação temporal explícita. Deste modo, o objetivo deste estudo é encontrar novos genes oscilatórios, a que coletivamente chamamos ClockOME, que estão ativos durante as primeiras etapas do desenvolvimento embrionário (somitogénese) da galinha, nos tecidos da mesoderme présomítica (PSM), e no membro superior (Limb); tecidos estes onde o relógio molecular foi descrito, atuando como regulador temporal das alterações genéticas subjacentes. Para tal, recolheu-se 3 conjuntos de dados (datasets) de transcriptómica obtidos por microarray de dois repositórios de dados públicos: GEO (da instituição americana NCBI) e ArrayExpress (da instituição europeia EMBL-EBI). Dois datasets continham dados de mesoderme paraxial (PSM) – tecido onde ocorre a somitogénese; e um dataset de dados de obtidos do membro superior do embrião de galinha. Com o objetivo de normalizar os três datasets de forma a torná-los comparáveis (uma vez que são oriundos de processos experimentais diferentes), foi desenvolvido um pacote de R denominado “FrozenChicken: Promoting the meta-analysis of chicken microarray data” (publicado em 2021) (https://doi.org/10.1101/2021.02.25.432894). Este pacote contém dados sumarizados de 472 datasets de microarrays de embriões de galinha, tornando possível a normalização por fRMA (frozen Robust MultiArray) de microarrays de Gallus gallus. Após normalização e controlo de qualidade dos valores de expressão genética, os dados da PSM e do membro foram processados com o Oscope (algoritmo de ordenação pseudo-temporal), com o propósito de prever genes oscilatórios. Este algoritmo avalia todas as combinações de pares de genes, agrupando aqueles que apresentem padrões de expressão semelhantes, ou seja, cujos valores de expressão ao longo das amostras seguem trajetórias semelhantes, indiciando um período de oscilação potencialmente semelhante. Os clusters de genes previstos pelo Oscope foram posteriormente submetidos a uma análise de enriquecimento funcional e a uma análise de interações funcionais, com o intuito de perceber o seu potencial papel biológico, e funções moleculares subjacentes. O Oscope reportou três listas de genes potencialmente oscilatórios: dois grupos foram encontrados a partir dos dados da PSM (com 106 e 32 genes cada) e o terceiro grupo de 162 genes foi encontrado nos dados do membro superior. No total, a lista de genes que denominamos ClockOME é composta por 296 genes potencialmente oscilatórios, envolvidos em diversos mecanismos regulatórios importantes para o desenvolvimento embrionário e para a morfogénese. A maioria dos genes presentes nesta lista não estão descritos na literatura como sendo oscilatórios (novel candidates), representando, portanto, uma mais-valia para a comunidade científica que estuda o relógio molecular embrionário. Estes genes parecem estar associados a funções como remodelação da cromatina, regulação da transcrição, metabolismo proteico e metabolismo do oxigénio, sendo, portanto, bons candidatos para futura validação experimental. Notavelmente, o Oscope identificou com sucesso o Mesp2, um gene oscilatório bem descrito na literatura, mostrando assim a validade e o potencial desta abordagem teórica. Em suma, este trabalho produziu uma lista de 296 genes potencialmente oscilatórios. Com base na sua novidade e na função molecular anotada, foi proposta uma lista de seis genes candidatos de particular relevância para validação experimental no futuro próximo, nomeadamente: SRC, PTCH1, NOTCH2, YAP1, KDR, CTR9. Assim, as listas resultantes do trabalho desta tese poderão agora guiar futuras experiências laboratoriais capazes de adicionar novos interactores moleculares ao atual modelo do relógio molecular embrionário

    Generating and characterizing primate iPSCs for evolutionary analyses

    Get PDF
    The similarities and differences between us and our closest relatives, the primates, have fascinated researchers for decades and evoked various approaches to better understand the underlying genotype-phenotype relationship. Starting with early comparisons of protein sequences between humans and chimpanzees, substantial technological advances in genomics have led to a deeper understanding of the complexities in this relationship, ranging from cataloging genetic differences to modeling genetic differences in cellular and animal systems. Furthermore, the lack of genetic differences - sequence conservation - is crucial to annotate the human genome and interpret biomedically relevant variants within humans. Charting differences and similarities in molecular and cellular properties can take such a comparative approach to the next phenotypic level. In particular, similar to the information obtained from DNA conservation, expression conservation could help annotating and interpreting human gene expression patterns and thus also provide biomedically relevant information. However, the major limiting factor in this venture is the availability of comparable samples of different primates, mainly due to ethical constraints. Induced pluripotent stem cells (iPSCs) are used in humans to overcome such limitations, as they can be propagated indefinitely and differentiated to many different cell types. Thus, they can provide a valuable and unique resource for functional primate genomics. In this context, I established a method to generate iPSCs from primates. One of the major challenges in generating iPSCs from non-model organisms is the acquisition of the somatic cells for reprogramming. Therefore, I focused on urine as a non-invasive cell source and could show that cells can be isolated from very small amounts of primate urine samples, which were collected in an unsterile manner. These cells can be efficiently reprogrammed into iPSCs using the footprint-free Sendai Virus reprogramming method. Utilizing this approach, we generated four iPSC lines from two orangutans, three iPSC lines from one gorilla and nine lines from five humans. We validated the pluripotecy of these lines using immunocytochemistry, differentiation assays and also classified the cells as pluripotent using bulk RNA-sequencing. We further showed that expression differences among clones are comparable to those among individuals and considerably larger than technical sources of variation, suggesting that these cells are a suitable resource for functional primate genomics. As RNA-sequncing (RNA-seq) is a decisive assay to classify cells and to study gene expression in a comparative context, a robust and affordable method to quantify RNA expression levels is indispensable. I contributed to develop prime-seq, a sensitive bulk RNA-seq protocol that we showed to perform equivalently to standard bulk RNA-seq methods, but at a fourfold higher efficiency due to almost 50-fold cheaper library costs. This is highly useful to e.g. classify generated iPSCs as described above. However, to compare heterogenous cell populations, as they arise for example during the differentiation of iPSCs, RNA-seq with single-cell resolution (scRNA-seq) is crucial. I contributed to develop mcSCRB-seq, a sensitive, powerful and efficient single cell RNA-seq method, that is plate-based and hence, can be used for scRNA-seq on sorted single cells. Finally, I utilized mcSCRB-seq to compare gene expression trajectories during differentiation of our primate iPSCs towards neural precursor cells (NPCs). We sampled single cells of nine different clones from three species at six different time points during early neural differentiation and thus generated a comprehensive dataset to study this process in a comparable manner. We identify genes with a conserved constant up-regulation throughout the trajectory and find that these genes have a higher probability of being mutation intolerant and a higher probability to be associated with neurodevelopmental disorders. This strengthens the hypothesis that identifying conserved expression patterns in primate iPSCs could carry unique functional information to annotate and interpret the human genome.\par In summary, within my thesis I describe the basis for comparative research settings, by providing a non-invasive and footprint-free method to generate iPSCs from various primates. Additionally, I contributed to efficient methods to characterize these cells and showcase in an encompassing study how expression conservation can help to better understand the human genome
    corecore