18 research outputs found

    Computational methods for single-cell omics across modalities

    Get PDF

    MultiMAP: dimensionality reduction and integration of multimodal data.

    Get PDF
    Multimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, a novel algorithm for dimensionality reduction and integration. MultiMAP can integrate any number of datasets, leverages features not present in all datasets, is not restricted to a linear mapping, allows the user to specify the influence of each dataset, and is extremely scalable to large datasets. We apply MultiMAP to single-cell transcriptomics, chromatin accessibility, methylation, and spatial data and show that it outperforms current approaches. On a new thymus dataset, we use MultiMAP to integrate cells along a temporal trajectory. This enables quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of expression versus binding site opening kinetics

    Deep Learning in Single-Cell Analysis

    Full text link
    Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.Comment: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysi

    The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution

    Get PDF
    Crucial transitions in cancer—including tumor initiation, local expansion, metastasis, and therapeutic resistance—involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer

    Leveraging single-cell genomics to uncover clinical and preclinical responses to cancer immunotherapy

    Get PDF
    Immune checkpoint inhibitors (ICIs) provide durable clinical responses in about 20% of cancer patients, but have been largely ineffective for non-immunogenic cancers that lack intratumoral T cells. Most tumors have somatic mutations that encode for mutant proteins that are tumor-specific and not expressed on normal cells (termed neoantigens). Cancers, such as melanoma, with the highest mutational burdens are more likely to respond to single agent ICIs. However, most cancers, including pancreatic ductal adenocarcinoma (PDAC), have lower mutational loads, resulting in fewer T cells infiltrating the tumor. Studies have previously demonstrated that an allogeneic GM-CSF-based vaccine enhances T cell infiltration into human pancreatic cancer. Recent work with Panc02 cells, which express around 60 neoantigens similar to human PDAC, showed that PancVAX, a neoantigen-targeted vaccine, when paired with immune modulators cleared tumors in Panc02-bearing mice. This data suggests that cancer vaccines targeting tumor neoantigens induce neoepitope-specific T cells, which can be further activated by ICIs, leading to tumor rejection. Currently, the impact of ICIs and neoantigen-targeted vaccines on immune cell expression states and the underlying mechanism of therapeutic response remains poorly defined. Comprehensive characterization of responding immune cells, particularly T cells, will be critical in understanding mechanisms of response and providing a rationale for combinatorial therapies. In this work, we develop innovative computational methods and analysis pipelines to analyze the tumor-immune microenvironment at single-cell resolution. We establish an algorithm to quantify differential heterogeneity in single-cell RNA-seq data, demonstrate the use of non-negative matrix factorization and transfer learning algorithms to identify previously unknown and conserved ICI responses between species, and develop a novel algorithm to physicochemically compare single-cell T cell receptor sequences. We leverage these methods in various contexts to yield new insight into the biological mechanisms underlying positive immunotherapeutic responses in diverse tumor types, including PDAC

    Context matters:the power of single-cell analyses in identifying context-dependent effects on gene expression in blood immune cells

    Get PDF
    The human immune system is a complex system that we still do not fully understand. No two humans react in the same way to attacks by bacteria, viruses or fungi. Factors such as genetics, the type of pathogen or previous exposure to the pathogen may explain this diversity in response. Single-cell RNA sequencing (scRNA-seq) is a new technique that enables us to study the gene expression of each cell individually, allowing us to study immune diversity in much greater detail. This increased resolution helps us discern how disease-associated genetic variants actually contribute to disease. In this thesis, I studied the relation between disease-associated genetic variants and gene expression levels in the context of different cell types and pathogen exposures in order to gain insight into the working mechanisms of these variants. For many variants we learnt in which cell types and under which pathogen exposures they affect gene expression, and we were even able to identify changes in gene co-expression, suggesting that disease-associated variants change how our genes interact with each other. With the single-cell field being so new, much of my work was showing the feasibility of using scRNA-seq to study the interplay between genetics and gene expression. To set up future research, we created guidelines for these analyses and established a consortium that brings together many major scientists in the field to enable large-scale studies across an even wider variety of contexts. This final work helps inform current and future large-scale scRNA-seq research

    Probabilistic modelling of single cell multi-omics data

    Get PDF
    Multicellular organisms possess a diverse set of cells exhibiting unique properties and function. Despite their physiology and role each cell owns the same copy of genetic in- structions encoded in its DNA. The ability of cells to differentiate into various shapes and forms stems from a careful orchestration of gene expression through various regulatory mechanisms. Recent developments in single cell multi-omics protocols offer unprecedented opportu- nities to simultaneously quantify phenomena in epigenome and gene expression at a single cell resolution. Advances in cell isolation and barcoding eliminated various confounding phenomena, shedding light into the regulatory role of epigenome in gene expression over diverse tissues and cells. Yet, combining omics modalities introduces serious statistical and computational challenges. Limitations of single-omics get exacerbated when combined into multi-modal assays, making result interpretation hard. In this thesis, we argue that inconsistent treatment of technical variability offered by classical statistical tools can corrupt statistical analyses and produce misleading results. In the Bayesian template, we introduce probabilistic models that explicitly and transparently decouple technical variability from biological signal. These methods are then used to investigate how epigenetic regulatory mechanisms interact with gene expression, both at genomic and at a cellular level. Single cell sequencing technologies are notoriously affected by high sparsity, leaving scientists to wonder if data are a product of sample handling or some genes are not expressed. As a result, even simple correlative tools (eg. Pearson’s correlation) seeking to identify regions with strong regulatory patterns between molecular layers routinely pinpoint a handful of associations. To overcome some of these limitations we introduce SCRaPL (Single Cell Regulatory Pattern Learning), a Bayesian hierarchical model to infer correlation between different omics components. SCRaPL’s uncertainty quantification allows for accurate results and good control over false positives, compared to its counterparts. Existing limitations force practitioners to partially or fully discard molecular modalities from cell observations, significantly under-powering subsequent downstream analysis. An alternative solution for scaling datasets is to post-experimentally address protocol limitations using a generative model. We introduce single cell Multi View Inference (scMVI), a deep learning model designed to accommodate analyses on both partially and fully observed data. Using jointly quantified data, scMVI builds a low-dimensional joint latent space by aligning omcis representations for each cell. In similar cells, scMVI can match individual modalities creating more complex sets. Subsequently, this manifold is used to approximate the data generating process. Hence, in partially quantified cells missing observations could be imputed getting the full potential of the data. To summarize, this thesis proposes novel statistical tools to interpret the regulatory interactions between epigenome and gene expression using data from modern multi-omics sequencing experiments. Their flexible design along with robust uncertainty quantification, allow these methods to unlock the immense potential of existing and future sequencing protocols. We hope that with the increased adoption in these methods, SCRaPL and scMVI will become an integral part of downstream analysis

    Understanding Gene Regulation In Development And Differentiation Using Single Cell Multi-Omics

    Get PDF
    Transcriptional regulation is a major determinant of tissue-specific gene expression during development. My thesis research leverages powerful single-cell approaches to address this fundamental question in two developmental systems, C. elegans embryogenesis and mouse embryonic hematopoiesis. I have also developed much-needed computational algorithms for single-cell data analysis and exploration. C. elegans is an animal with few cells, but a striking diversity of cell types. In this thesis, I characterize the molecular basis for their specification by analyzing the transcriptomes of 86,024 single embryonic cells. I identified 502 terminal and pre-terminal cell types, mapping most single cell transcriptomes to their exact position in C. elegans’ invariant lineage. Using these annotations, I find that: 1) the correlation between a cell’s lineage and its transcriptome increases from mid to late gastrulation, then falls dramatically as cells in the nervous system and pharynx adopt their terminal fates; 2) multilineage priming contributes to the differentiation of sister cells at dozens of lineage branches; and 3) most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state. Next, I studied the development of hematopoietic stem cells (HSCs). All HSCs come from a specialized type of endothelial cells in the major arteries of the embryo called hemogenic endothelium (HE). To examine the cellular and molecular transitions underlying the formation of HSCs, we profiled nearly 40,000 rare single cells from the caudal arteries of embryonic day 9.5 (E9.5) to E11.5 mouse embryos using single-cell RNA-Seq and single-cell ATAC-Seq. I identified a continuous developmental trajectory from endothelial cells to early precursors of HSCs, and several critical transitional cell types during this process. The intermediate stage most proximal to HE, which we termed pre-HE, is characterized by increased accessibility of chromatin enriched for SOX, FOX, GATA, and SMAD binding motifs. I also identified a developmental bottleneck separates pre-HE from HE, and RUNX1 dosage regulates the efficiency of the pre-HE to HE transition. A distal enhancer of Runx1 shows high accessibility in pre-HE cells at the bottleneck, but loses accessibility thereafter. Once cells pass the bottleneck, they follow distinct developmental trajectories leading to an initial wave of lympho-myeloid-biased progenitors, followed by precursors of HSCs. During the course of both projects, I have developed novel computational methods for analyzing single-cell multi-omics data, including VERSE, PIVOT and VisCello. Together, these tools constitute a comprehensive single cell data analysis suite that facilitates the discovery of novel biological mechanisms
    corecore