64 research outputs found

    Development of model systems to reconstruct the unicellular prehistory of animals : an emphasis on the cell cycle

    Get PDF
    El origen de la multicelularidad animal tiene sus raíces en el proceso de división celular. Conocer las bases moleculares del control de la divisón celular en animales y en sus parientes unicelulares tiene el potencial de permitirnos comprender qué cambios ocurrieron para permitir el origen de la multicelularidad. Sin embargo, nuestra capacidad experimental en los parientes unicelulares de los animales está bastante limitada. En esta tesis se ha contribuido al desarrollo de la especie Capsaspora owczarzaki como un organismo modelo, al desarrollar herramientas genéticas de transfección y herramientas de sincronización de ciclo celular. La caracterización del ciclo celular de Capsaspora ha permitido saber que muchos genes importantes en el ciclo celular de animales también poseen actividad transcripcional en Capsaspora, incluyendo los ortólogos principales de las ciclinas y CDKs de animales. Asimismo, el desarrollo de herramientas de transfección abre la puerta a nuevos estudios funcionales a nivel molecular en esta especie, lo cual podrá permitir conocer las funciones de muchos genes relacionados con la multicelularidad animal en el contexto de una especie unicelular.The origin of animal multicellularity has its roots in the process of cell division. Understanding the molecular basis of cell division in animals and their unicellular relatives has the potential to elucidate what changes in the control of cell division played a role, if any, in the transition to multicellularity. However, the experimental amenability of the closest relatives of animals is yet very limited. This thesis contributes to the development of Capsaspora owczarzaki, a close unicellular relative of animals, as a model organism, by developing genetic tools for DNA transfection and culture synchronization tools to study the cell cycle. Our characterization of the Capsaspora cell cycle revealed that many genes important in the cell cycle of animal cells are also transcriptionally regulated in Capsaspora, including the main orthologs of animal cyclins and CDKs present in Capsaspora. Likewise, the development of genetic tools opens the door to new functional studies in this species, which will allow to understand the role of many genes related to multicellularity under the context of a unicellular species

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Scalable Machine Learning Methods For The Analysis Of Single-Cell Transcriptomics And Multiomics Data

    Get PDF
    Transcriptomics and proteomics-based expression profiling technologies have become increasingly popular, more affordable, and more accurate in recent years. Expression profiling of expression at the single-cell resolution allows investigators to identify rare cell subtypes in human tissue which would be otherwise confounded in lower-resolution, bulk sequencing technologies. Previously, investigators studied human cell populations by profiling RNA expression in single cells using single-cell RNA sequencing (scRNA-seq) technologies. More recently, multi-modality sequencing technologies such as Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) have emerged, which allow investigators to profile multiple forms of biological expression (in this case RNA and protein expression) simultaneously in the same cells. Investigators can study human biology now with greater detail than ever before, but challenges remain. (1) Cell subpopulations are not always neatly separated from one another, which makes cell type classification difficult. (2) Technical batch effects also often plague scRNA-seq studies and confound real biological signals. (3) Multi-modality technologies are excellent but remain expensive to do at scale. In this work, we seek to address these various challenges and difficulties associated with scRNA-seq and CITE-seq analyses. To address challenge (1), we propose a smooth pseudotemporal modeling approach which characterizes a cell’s identity as a mixture of two discrete identities, allowing for a continuous sliding-scale cell type rather than requiring cells to separate into discrete types. To address challenge (2), we propose an augmented autoencoder which uses a self-supervised Kullback–Leibler divergence, along with a specialized branching architecture to correct for batch effects in the full gene expression feature space. Lastly, to address challenge (3), we develop a hybrid feedforward-recurrent neural network approach which supports protein prediction, imputation, embedding, uncertainty quantification, and cell type label transfer, allowing the user to use reference CITE-seq datasets to predict and study protein expression in larger single modality RNA-only data. We validate the utility of each of our approaches using real datasets with gold standard true expression and experimentally validated cell type labels. We also demonstrate real use cases for our methods, such as improving downstream pseudotime analyses using batch correction and identifying immune response biomarkers to an H1N1 vaccine

    Synthesising executable gene regulatory networks in haematopoiesis from single-cell gene expression data

    Get PDF
    A fundamental challenge in biology is to understand the complex gene regulatory networks which control tissue development in the mammalian embryo, and maintain homoeostasis in the adult. The cell fate decisions underlying these processes are ultimately made at the level of individual cells. Recent experimental advances in biology allow researchers to obtain gene expression profiles at single-cell resolution over thousands of cells at once. These single-cell measurements provide snapshots of the states of the cells that make up a tissue, instead of the population-level averages provided by conventional high-throughput experiments. The aim of this PhD was to investigate the possibility of using this new high resolution data to reconstruct mechanistic computational models of gene regulatory networks. In this thesis I introduce the idea of viewing single-cell gene expression profiles as states of an asynchronous Boolean network, and frame model inference as the problem of reconstructing a Boolean network from its state space. I then give a scalable algorithm to solve this synthesis problem. In order to achieve scalability, this algorithm works in a modular way, treating different aspects of a graph data structure separately before encoding the search for logical rules as Boolean satisfiability problems to be dispatched to a SAT solver. Together with experimental collaborators, I applied this method to understanding the process of early blood development in the embryo, which is poorly understood due to the small number of cells present at this stage. The emergence of blood from Flk1+ mesoderm was studied by single cell expression analysis of 3934 cells at four sequential developmental time points. A mechanistic model recapitulating blood development was reconstructed from this data set, which was consistent with known biology and the bifurcation of blood and endothelium. Several model predictions were validated experimentally, demonstrating that HoxB4 and Sox17 directly regulate the haematopoietic factor Erg, and that Sox7 blocks primitive erythroid development. A general-purpose graphical tool was then developed based on this algorithm, which can be used by biological researchers as new single-cell data sets become available. This tool can deploy computations to the cloud in order to scale up larger high-throughput data sets. The results in this thesis demonstrate that single-cell analysis of a developing organ coupled with computational approaches can reveal the gene regulatory networks that underpin organogenesis. Rapid technological advances in our ability to perform single-cell profiling suggest that my tool will be applicable to other organ systems and may inform the development of improved cellular programming strategies.Microsoft Research PhD Scholarshi

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Integrated Spatial Genomics Reveals Organizational Principles of Single-Cell Nuclear Architecture

    Get PDF
    Three-dimensional (3D) nuclear architecture plays key roles in many cellular processes such as gene regulation and genome replication. Recent sequencing-based and imaging-based single-cell studies have characterized a high variability of nuclear features in individual cells from a wide-range of measurement modalities, such as chromosome structures, subnuclear structures, chromatin states, and nascent transcription. However, the lack of technologies that allow us to interrelate those nuclear features simultaneously in the same single cells limits our understanding of nuclear architecture. To overcome this limitation, a technology that can examine 3D nuclear features across modalities from the same single cells is required. Here, we demonstrate integrated spatial genomics approaches, which enable genome-wide investigation of chromosome structures, subnuclear structures, chromatin states, and transcriptional states in individual cells. In Chapter 2, we introduce the "track first and identify later" approach, which enables multiplexed tracking of genomic loci in live cells by combining CRISPR/Cas9 live imaging and DNA sequential fluorescence in situ hybridization (DNA seqFISH) technologies. We demonstrate our approach by resolving the dynamics of 12 unique subtelomeric loci in mouse embryonic stem (ES) cells. In Chapter 3, we present the intron seqFISH technology, which enables transcriptome-scale gene expression profiling at their nascent transcription active sites in individual nuclei in mouse ES cells and fibroblasts, along with mRNA and lncRNA seqFISH and immunofluorescence. We show the transcription active sites position at the surfaces of chromosome territories with variable inter-chromosomal organization in individual nuclei. By building upon those technologies, in Chapter 4, we demonstrate integrated spatial genomics in mouse ES cells, which enables to image thousands of genomic loci by DNA seqFISH+, along with sequential immunofluorescence and RNA seqFISH in individual cells. We show "fixed loci" that are invariably associated with specific subnuclear structures across hundreds of single cells that can constrain nuclear architecture in individual nuclei. In addition, we find individual genomic loci appear to be pre-positioned to specific nuclear compartments with different frequencies, which are independent from nascent transcriptional states of single cells. Lastly, in Chapter 5, we demonstrate the integrated spatial genomics technology in the mouse brain cortex, enabling the investigation of single-cell nuclear architecture in a cell-type specific fashion as well as the exploration of common organizational principles of nuclear architecture across cell types. We reveal that inter-chromosomal organization and radial positioning of chromosomes are arranged with cell-type specific chromatin fixed loci and subnuclear structure organization in diverse cell types. We also uncover the variable organization of chromosome domain structures at the sub-megabase scale in individual cells, which can be obscured with bulk measurements. Together, these results demonstrate the ability of integrated spatial genomics to advance our overall understanding of single-cell nuclear architecture in various biological systems.</p

    Characterization of the spontaneous EEG activity in the Alzheimer's disease continuum: from local activation to network organization

    Get PDF
    La presente Tesis Doctoral se presenta como un compendio de cuatro publicaciones indexadas en el Journal Citation Reports. El objetivo de estas publicaciones es la caracterización de los cambios neuronales subyacentes en las diferentes etapas de la enfermedad de Alzheimer (EA) y su etapa prodrómica, el deterioro cognitivo leve (DCL), siguiendo tres niveles de análisis: activación local, interacción entre pares de sensores, y organización de red. Los principales cambios encontrados a medida que progresa la enfermedad son: (i) una lentificación, y una pérdida de complejidad e irregularidad de la actividad EEG espontánea; (ii) una disminución significativa de la conectividad en bandas altas de frecuencia y un aumento en las bandas bajas; y (iii) una pérdida en la integración y la segregación de las redes neuronales. Estos hallazgos han proporcionado información adicional sobre las alteraciones cerebrales de la EA en sus diferentes etapas, útiles para comprender mejor sus mecanismos fisiopatológicos.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

    Manipulation of global chromatin architecture in the human cell nucleus and critical assessment of current model views

    Get PDF
    In spite of strong evidence that the mammalian cell nucleus is a highly organized organelle, a consensus on basic principles of global nuclear architecture has not so far been achieved. The existence of major architectural features such as an organized interchromatin compartment and higher order organization of chromatin postulated by some of the models is questioned or even refused by the others. This study was set up to test predictions of the various model views after manipulating nuclear architecture by applying the induced formation of hypercondensed chromatin (HCC). This method leads to massive but completely reversible conformational changes of chromatin arrangements in living cell nuclei, but does not affect the cells survivability. Nuclear functions like transcription, replication and cell cycling were immediately stalled when HCC formation was induced, but were rapidly recovered upon recovery of normal chromatin configurations. The emerging pattern of HCC revealed a 3D network of interconnected chromosome territories. The surface of the emerging HCC bundles was the site of preceding activity like RNA transcription or DNA replication, which confirmed the existence of a distinct topological arrangement of functional processes with respect to the architecture of chromatin. This arrangement could further be demonstrated by analyzing the topography of defined chromatin modifications, showing that active chromatin is preferentially located at the HCC bundle surfaces, whereas inactive chromatin regions are preferentially found in the HCC bundle interior. The emerging patterns of HCC were further strikingly similar in consecutively repeated cycles of HCC formation and recovery, demonstrating a non-random but pre-existing and defined chromatin and interchromatin topography. All results of this study were obtained using confocal laser scanning microscopy. A protocol for deconvolution of confocal images was established to enhance confocal image quality to an extent sufficient for subsequent image analysis. In contribution to the present model views this study demonstrates: [1] That most chromatin exists in the form of higher-order sub-compartments ('~1 Mb chromatin domains') above the level of extended 30 nm fibers and [2] That an interchromatin compartment exists as a dynamic, structurally distinct nuclear compartment, which is functionally linked with the chromatin compartment. An updated chromosome territory-interchromatin compartment model on the basis of the gained results is presented at the end of this thesis together with an attempt to provide a comprehensive view linking ultrastructural with light microscopic insights

    Dissection of Complex Genetic Correlations into Interaction Effects

    Get PDF
    Living systems are overwhelmingly complex and consist of many interacting parts. Already the quantitative characterization of a single human cell type on genetic level requires at least the measurement of 20000 gene expressions. It remains a big challenge for theoretical approaches to discover patterns in these signals that represent specific interactions in such systems. A major problem is that available standard procedures summarize gene expressions in a hard-to-interpret way. For example, principal components represent axes of maximal variance in the gene vector space and thus often correspond to a superposition of multiple different gene regulation effects (e.g. I.1.4). Here, a novel approach to analyze and interpret such complex data is developed (Chapter II). It is based on an extremum principle that identifies an axis in the gene vector space to which as many as possible samples are correlated as highly as possible (II.3). This axis is maximally specific and thus most probably corresponds to exactly one gene regulation effect, making it considerably easier to interpret than principle components. To stabilize and optimize effect discovery, axes in the sample vector space are identified simultaneously. Genes and samples are always handled symmetrically by the algorithm. While sufficient for effect discovery, effect axes can only linearly approximate regulation laws. To represent a broader class of nonlinear regulations, including saturation effects or activity thresholds (e.g. II.1.1.2), a bimonotonic effect model is defined (II.2.1.2). A corresponding regression is realized that is monotonic over projections of samples (or genes) onto discovered gene (or sample) axes. Resulting effect curves can approximate regulation laws precisely (II.4.1). This enables the dissection of exclusively the discovered effect from the signal (II.4.2). Signal parts from other potentially overlapping effects remain untouched. This continues iteratively. In this way, the high-dimensional initial signal (II.2.1.1) can be dissected into highly specific effects. Method validation demonstrates that superposed effects of various size, shape and signal strength can be dissected reliably (II.6.2). Simulated laws of regulation are reconstructed with high correlation. Detection limits, e.g. for signal strength or for missing values, lie above practical requirements (II.6.4). The novel approach is systematically compared with standard procedures such as principal component analysis. Signal dissection is shown to have clear advantages, especially for many overlapping effects of comparable size (II.6.3). An ideal test field for such approaches is cancer cells, as they may be driven by multiple overlapping gene regulation networks that are largely unknown. Additionally, quantification and classification of cancer cells by their particular set of driving gene regulations is a prerequisite towards precision medicine. To validate the novel method against real biological data, it is applied to gene expressions of over 1000 tumor samples from Diffuse Large B-Cell Lymphoma (DLBCL) patients (Chapter III). Two already known subtypes of this disease (cf. I.1.2.1) with significantly different survival following the same chemotherapy were originally also discovered as a gene expression effect. These subtypes can only be precisely determined by this effect on molecular level. Such previous results offer a possibility for method validation and indeed, this effect has been unsupervisedly rediscovered (III.3.2.2). Several additional biologically relevant effects have been discovered and validated across four patient cohorts. Multivariate analyses (III.2) identify combinations of validated effects that can predict significant differences in patient survival. One novel effect possesses an even higher predictive value (cf. III.2.5.1) than the rediscovered subtype effect and is genetically more specific (cf. III.3.3.1). A trained and validated Cox survival model (III.2.5) can predict significant survival differences within known DLBCL subtypes (III.2.5.6), demonstrating that they are genetically heterogeneous as well. Detailed biostatistical evaluations of all survival effects (III.3.3) may help to clarify the molecular pathogenesis of DLBCL. Furthermore, the applicability of signal dissection is not limited to biological data. For instance, dissecting spectral energy distributions of stars observed in astrophysics might be useful to discover laws of light emission
    • …
    corecore