21 research outputs found

    Barcode, UMI, Set format and BUStools

    Get PDF
    We introduce the Barcode-UMI-Set format (BUS) for representing pseudoalignments of reads from single-cell RNA-seq experiments. The format can be used with all single-cell RNA-seq technologies, and we show that BUS files can be efficiently generated. BUStools is a suite of tools for working with BUS files and facilitates rapid quantification and analysis of single-cell RNA-seq data. The BUS format therefore makes possible the development of modular, technology-specific and robust workflows for single-cell RNA-seq analysis

    Modular and efficient pre-processing of single-cell RNA-seq

    Get PDF
    Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses

    Barcode, UMI, Set format and BUStools

    Get PDF
    We introduce the Barcode-UMI-Set format (BUS) for representing pseudoalignments of reads from single-cell RNA-seq experiments. The format can be used with all single-cell RNA-seq technologies, and we show that BUS files can be efficiently generated. BUStools is a suite of tools for working with BUS files and facilitates rapid quantification and analysis of single-cell RNA-seq data. The BUS format therefore makes possible the development of modular, technology-specific and robust workflows for single-cell RNA-seq analysis

    Protocol for fast scRNA-seq raw data processing using scKB and non-arbitrary quality control with COPILOT

    Get PDF
    We describe a protocol to perform fast and non-arbitrary quality control of single-cell RNA sequencing (scRNA-seq) raw data using scKB and COPILOT. scKB is a wrapper script of kallisto and bustools for accelerated alignment and transcript count matrix generation, which runs significantly faster than the popular tool Cell Ranger. COPILOT then offers non-arbitrary background noise removal by comparing distributions of low-quality and high-quality cells. Together, this protocol streamlines the processing workflow and provides an easy entry for new scRNA-seq users. For complete details on the use and execution of this protocol, please refer to Shahan et al. (2022)

    Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq

    Get PDF
    The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/

    A single cell Arabidopsis root atlas reveals developmental trajectories in wild type and cell identity mutants

    Get PDF
    Cell fate acquisition is a fundamental developmental process in all multicellular organisms. Yet, much is unknown regarding how a cell traverses the pathway from stem cell to terminal differentiation. Advances in single cell genomics1 hold promise for unraveling developmental mechanisms2–3 in tissues4, organs5–6, and organisms7–8. However, lineage tracing can be challenging for some tissues9 and integration of high-quality datasets is often necessary to detect rare cell populations and developmental states10,11. Here, we harmonized single cell mRNA sequencing data from over 110,000 cells to construct a comprehensive atlas for a stereotypically developing organ with indeterminate growth, the Arabidopsis root. To test the utility of the atlas to interpret new datasets, we profiled mutants for two key transcriptional regulators at single cell resolution, shortroot and scarecrow. Although both transcription factors are required for early specification of cell identity12, our results suggest the existence of an alternative pathway acting in mature cells to specify endodermal identity, for which SHORTROOT is required. Uncovering the architecture of this pathway will provide insight into specification and stabilization of the endodermis, a tissue analogous to the mammalian epithelium. Thus, the atlas is a pivotal advance for unraveling the transcriptional programs that specify and maintain cell identity to regulate organ development in space and time

    Gene dynamics of maturation in endogenous and pluripotent stem cell-derived cardiomyocytes

    Get PDF
    A primary limitation in the clinical application of pluripotent stem cell-derived cardiomyocytes (PSC-CMs) is the failure of these cells to achieve full functional maturity. In vivo, cardiomyocytes undergo numerous adaptive changes during perinatal maturation. By contrast, PSC-CMs fail to fully undergo these developmental processes, instead remaining arrested at an embryonic stage of maturation. To date, however, the precise mechanisms by which directed differentiation differs from endogenous development, leading to consequent PSC-CM maturation arrest, are unknown. The advent of single cell RNA-sequencing (scRNA-seq) has offered great opportunities for studying CM maturation at single cell resolution. However, postnatal cardiac scRNA-seq has been limited owing to technical difficulties in the isolation of single CMs. Additionally, cross-study comparison is limited by dataset specific batch effects. In this dissertation, I first established large particle fluorescence-activated cell sorting (LP-FACS) for isolation of viable single adult CMs. I secondly developed transcriptomic entropy as a robust, batch effect-resistant approach to quantifying CM maturation. With these and other computational tools, I investigated gene expression trends in endogenous and PSC-derived CMs. I first generated an scRNA-seq reference of mouse in vivo CM maturation with extensive sampling of perinatal time periods. I subsequently generated isogenic embryonic stem cells and created an in vitro scRNA-seq reference of PSC-CM directed differentiation. Through computational analysis, I identified a perinatal iimaturation program in endogenous CMs that is poorly recapitulated in vitro. By comparison of these trajectories with previously published human datasets, I identified a network of nine transcription factors (TFs) whose targets are consistently dysregulated in PSC-CMs across species. Notably, I demonstrated that these TFs are only partially activated in common ex vivo approaches to engineer PSC-CM maturation. This dissertation represents the first direct comparison of CM maturation in vivo and in vitro at the single cell level. Moreover, the findings and tools developed here can be leveraged towards improving the clinical viability of PSC-CMs

    Utilization of single-cell RNA-Seq and genome-scale modeling for investigating cancer metabolism

    Get PDF
    Cancer remains a leading cause of death worldwide, and its dysregulated metabolism is a promising target for therapy. However, metabolism is complex to study – the metabolism of a cell involves the interplay of thousands of chemical reactions that are combined in different ways across tissues and cell types. Genome-scale metabolic models (GEMs), where the reaction networks of cells are described using a mathematical formulation, have been developed to help in such studies. In this thesis, methods were developed for determining the active metabolic network (the context-specific model) in individual cell types, followed by studies of cancer metabolism. To enable identification of the active metabolic network per cell type, single-cell RNA sequencing (scRNA-Seq) was employed to detect the presence of individual genes. However, the technical and biological variation in scRNA-Seq data poses a major challenge to the identification of the active reaction network in a cell type. The variability of gene expression due to technical and biological factors was therefore examined, concluding that data from thousands of cells is often required to provide enough stability for robust model generation. An improved quantification method for scRNA-Seq data, called BUTTERFLY, was also developed and implemented as part of the kallisto-bustools scRNA-Seq workflow. A new optimized version of tINIT, which enables generation of context-specific models, was also developed. It allowed for generation of models based on bootstrapped cell populations, which were used to acquire the statistical uncertainty of models generated from scRNA-Seq data. Finally, the method was applied to a lung cancer dataset, identifying both known and unknown features of cancer metabolism.To further explore cancer metabolism, a study was conducted to investigate the most optimal metabolic behavior under different degrees of hypoxia. To this end, a diffusion-based model for estimating nutrient availability was developed, as well as a light-weight version of the tool GECKO that enables constraining the total enzyme usage in the model. The model could explain the glutamine addiction phenomenon in cancers and was used to show that metabolic collaboration between cell types in tumors is likely not important for growth

    Epithelial GPR35 protects from Citrobacter rodentium infection by preserving goblet cells and mucosal barrier integrity.

    Get PDF
    Goblet cells secrete mucin to create a protective mucus layer against invasive bacterial infection and are therefore essential for maintaining intestinal health. However, the molecular pathways that regulate goblet cell function remain largely unknown. Although GPR35 is highly expressed in colonic epithelial cells, its importance in promoting the epithelial barrier is unclear. In this study, we show that epithelial Gpr35 plays a critical role in goblet cell function. In mice, cell-type-specific deletion of Gpr35 in epithelial cells but not in macrophages results in goblet cell depletion and dysbiosis, rendering these animals more susceptible to Citrobacter rodentium infection. Mechanistically, scRNA-seq analysis indicates that signaling of epithelial Gpr35 is essential to maintain normal pyroptosis levels in goblet cells. Our work shows that the epithelial presence of Gpr35 is a critical element for the function of goblet cell-mediated symbiosis between host and microbiota
    corecore