Search CORE

25,146 research outputs found

COMPUTATIONAL ANALYSIS OF SINGLE-CELL TRANSCRIPTOMIC DATA FOR THE IDENTIFICATION AND CHARACTERIZATION OF CELL IDENTITY AND FATE POTENTIAL

Author: Noller Kathleen
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 03/01/2024
Field of study

Single-cell transcriptomics and the tools developed for single-cell RNA sequencing (scRNA-seq) analysis have enabled the characterization of tissues and disease states of interest, identification of rare cell types, and reconstruction of developmental lineages given only a snapshot in time. The field of transcriptomics has transformed from the first single cell analysis in 1990 to the development of high-throughput sequencing in the early 2000’s. Advances in computational analysis have paralleled development in the technology, though many analysis tools have ample shortcomings. Some of the challenges facing scRNA-seq data analysis will be addressed in this thesis, such as the development of a reliable method to quantify single-cell fate potential, the standardization of cell type annotation and downstream analysis, and the characterization of rare cell populations. The objective of this work is to develop a computational pipeline for single cell transcriptomic data analysis which incorporates a novel tool to quantify single-cell fate potential, termed “stemFinder,” and to apply this pipeline to interrogate rare and clinically relevant cell types and to characterize key transformations in cell type identity throughout the course of physiologic differentiation or pathological transformation. In this work, I demonstrate the robustness of stemFinder to changes in input parameters and its superior performance to current methods of in silico potency quantification. I also use stemFinder to illustrate interesting biological concepts regarding cell cycle gene expression patterns and cell fate commitment and the relative potencies of distinct populations. stemFinder is readily incorporated into a scRNA-seq data analysis pipeline to elucidate the physiology of fallopian tube epithelial cell self-renewal and bone remodeling, as well as the pathophysiology of high-grade ovarian cancer and rheumatoid arthritis. This thesis details the development and application of a scRNA-seq computational analysis pipeline—and create a novel tool as part of this pipeline—to create a census of the benign human ampulla and to implicate a rare cell type with a role in fallopian tube epithelial regeneration. This thesis will also leverage scRNA-seq analysis to understand osteoclastogenesis and investigate two subpopulations of anabolic and catabolic pre-osteoclasts, respectively, with implications in bone erosion and rheumatoid arthritis

Johns Hopkins University

Computational methods for large-scale single-cell RNA-seq and multimodal data

Author: Đỗ Văn Hoàn
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 09/11/2021
Field of study

Emerging single cell genomics technologies such as single cell RNA-seq (scRNA-seq) and single cell ATAC-seq provide new opportunities for discovery of previously unknown cell types, facilitating the study of biological processes such as tumor progression, and delineating molecular mechanism differences between species. Due to the high dimensionality of the data produced by the technologies, computation and mathematics have been the cornerstone in decoding meaningful information from the data. Computational models have been challenged by the exponential growth of the data thanks to the continuing decrease in sequencing costs and growth of large-scale genomic projects such as the Human Cell Atlas. In addition, recent single-cell technologies have enabled us to measure multiple modalities such as transcriptome, protome, and epigenome in the same cell. This requires us to establish new computational methods which can cope with multiple layers of the data. To address these challenges, the main goal of this thesis was to develop computational methods and mathematical models for analyzing large-scale scRNA-seq and multimodal omics data. In particular, I have focused on fundamental single-cell analysis such as clustering and visualization. The most common task in scRNA-seq data analysis is the identification of cell types. Numerous methods have been proposed for this problem with a current focus on methods for the analysis of large scale scRNA-seq data. I developed Specter, a computational method that utilizes recent algorithmic advances in fast spectral clustering and ensemble learning. Specter achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Specter allows us to process a dataset comprising 2 million cells in just 26 minutes. Moreover, the analysis of CITE-seq data, that simultaneously provides gene expression and protein levels, showed that Specter is able to incorporate multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. We have effectively handled big data for clustering analysis using Specter. The question is how to cope with the big data for other downstream analyses such as trajectory inference and data integration. The most simple scheme is to shrink the data by selecting a subset of cells (the sketch) that best represents the full data set. Therefore I developed an algorithm called Sphetcher that makes use of the thresholding technique to efficiently pick representative cells that evenly cover the transcriptomic space occupied by the original data set. I showed that the sketch computed by Sphetcher constitutes a more accurate presentation of the original transcriptomic landscape than existing methods, which leads to a more balanced composition of cell types and a large fraction of rare cell types in the sketch. Sphetcher bridges the gap between the scalability of computational methods and the volume of the data. Moreover, I demonstrated that Sphetcher can incorporate prior information (e.g. cell labels) to inform the inference of the trajectory of human skeletal muscle myoblast differentiation. The biological processes such as development, differentiation, and cell cycle can be monitored by performing single cell sequencing at different time points, each corresponding to a snapshot of the process. A class of computational methods called trajectory inference aims to reconstruct the developmental trajectories from these snapshots. Trajectory inference (TI) methods such as Monocle, can computationally infer a pseudotime variable which serves as a proxy for developmental time. In order to compare two trajectories inferred by TI methods, we need to align the pseudotime between two trajectories. Current methods for aligning trajectories are based on the concept of dynamic time warping, which is limited to simple linear trajectories. Since complex trajectories are common in developmental processes, I adopted arboreal matchings to compare and align complex trajectories with multiple branch points diverting cells into alternative fates. Arboreal matchings were originally proposed in the context of phylogenetic trees and I theoretically linked them to dynamic time warping. A suite of exact and heuristic algorithms for aligning complex trajectories was implemented in a software Trajan. When aligning single-cell trajectories describing human muscle differentiation and myogenic reprogramming, Trajan automatically identifies the core paths from which we are able to reproduce recently reported barriers to reprogramming. In a perturbation experiment, I showed that Trajan correctly maps identical cells in a global view of trajectories, as opposed to a pairwise application of dynamic time warping. Visualization using dimensionality reduction techniques such as t-SNE and UMAP is a fundamental step in the analysis of high-dimensional data. Visualization has played a pivotal role in discovering the dynamic trends in single cell genomics data. I developed j-SNE and j-UMAP as their generalizations to the joint visualization of multimodal omics data, e.g., CITE-seq data. The approach automatically learns the relative importance of each modality in order to obtain a concise representation of the data. When comparing with the conventional approaches, I demonstrated that j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes

Digitale Hochschulschriften der LMU

An Algorithm for Cellular Reprogramming

Author: Bloch Anthony
Brockett Roger
Brown Markus
Chen Haiming
Muir Lindsey
Patterson Geoff
Rajapakse Indika
Ronquist Scott
Publication venue
Publication date: 13/07/2017
Field of study

The day we understand the time evolution of subcellular elements at a level of detail comparable to physical systems governed by Newton's laws of motion seems far away. Even so, quantitative approaches to cellular dynamics add to our understanding of cell biology, providing data-guided frameworks that allow us to develop better predictions about and methods for control over specific biological processes and system-wide cell behavior. In this paper we describe an approach to optimizing the use of transcription factors in the context of cellular reprogramming. We construct an approximate model for the natural evolution of a synchronized population of fibroblasts, based on data obtained by sampling the expression of some 22,083 genes at several times along the cell cycle. (These data are based on a colony of cells that have been cell cycle synchronized) In order to arrive at a model of moderate complexity, we cluster gene expression based on the division of the genome into topologically associating domains (TADs) and then model the dynamics of the expression levels of the TADs. Based on this dynamical model and known bioinformatics, we develop a methodology for identifying the transcription factors that are the most likely to be effective toward a specific cellular reprogramming task. The approach used is based on a device commonly used in optimal control. From this data-guided methodology, we identify a number of validated transcription factors used in reprogramming and/or natural differentiation. Our findings highlight the immense potential of dynamical models models, mathematics, and data guided methodologies for improving methods for control over biological processes

arXiv.org e-Print Archive

Recommended from our members

A Network of microRNAs Acts to Promote Cell Cycle Exit and Differentiation of Human Pancreatic Endocrine Cells.

Author: Carrano Andrea C
Chiou Joshua
Frazer Kelly A
Gaertner Bjoern
Jin Wen
Kaestner Klaus H
Matta Ileana
Mulas Francesca
Nguyen-Ngoc Kim-Vy
Sander Maike
Shih Hung-Ping
Sui Yinghui
Vinckier Nicholas
Wang Allen
Wang Jinzhao
Zeng Chun
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Pancreatic endocrine cell differentiation is orchestrated by the action of transcription factors that operate in a gene regulatory network to activate endocrine lineage genes and repress lineage-inappropriate genes. MicroRNAs (miRNAs) are important modulators of gene expression, yet their role in endocrine cell differentiation has not been systematically explored. Here we characterize miRNA-regulatory networks active in human endocrine cell differentiation by combining small RNA sequencing, miRNA over-expression, and network modeling approaches. Our analysis identified Let-7g, Let-7a, miR-200a, miR-127, and miR-375 as endocrine-enriched miRNAs that drive endocrine cell differentiation-associated gene expression changes. These miRNAs are predicted to target different transcription factors, which converge on genes involved in cell cycle regulation. When expressed in human embryonic stem cell-derived pancreatic progenitors, these miRNAs induce cell cycle exit and promote endocrine cell differentiation. Our study delineates the role of miRNAs in human endocrine cell differentiation and identifies miRNAs that could facilitate endocrine cell reprogramming

eScholarship - University of California

MDC Repository

Trajectory-based differential expression analysis for single-cell sequencing data

Author: Cannoodt Robrecht
Clement Lieven
Dudoit Sandrine
Roux de Bézieux Hector
Saelens Wouter
Saeys Yvan
Street Kelly
Van den Berge Koen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression. Downstream of trajectory inference, it is vital to discover genes that are (i) associated with the lineages in the trajectory, or (ii) differentially expressed between lineages, to illuminate the underlying biological processes. Current data analysis procedures, however, either fail to exploit the continuous resolution provided by trajectory inference, or fail to pinpoint the exact types of differential expression. We introduce tradeSeq, a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of both within-lineage and between-lineage differential expression. By incorporating observation-level weights, the model additionally allows to account for zero inflation. We evaluate the method on simulated datasets and on real datasets from droplet-based and full-length protocols, and show that it yields biological insights through a clear interpretation of the data. Downstream of trajectory inference for cell lineages based on scRNA-seq data, differential expression analysis yields insight into biological processes. Here, Van den Berge et al. develop tradeSeq, a framework for the inference of within and between-lineage differential expression, based on negative binomial generalized additive models

Ghent University Academic Bibliography

Recommended from our members

1458 EMT-inhibiting transcription factor Ovol2 regulates directional cell migration and proliferation in adult skin epithelia

Author: Dai X
Haensel D
Jin S
Ma X
MacLean A
Nie Q
Sun P
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

eScholarship - University of California

Recommended from our members

Defining Epidermal Basal Cell States during Skin Homeostasis and Wound Healing Using Single-Cell Transcriptomics.

Author: Cang Zixuan
Cinco Rachel
Dai Xing
Dragan Morgan
Gong Yanwen
Gratton Enrico
Haensel Daniel
Jin Suoqin
Kessenbrock Kai
MacLean Adam L
Nguyen Quy
Nie Qing
Sun Peng
Vu Remy
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Our knowledge of transcriptional heterogeneities in epithelial stem and progenitor cell compartments is limited. Epidermal basal cells sustain cutaneous tissue maintenance and drive wound healing. Previous studies have probed basal cell heterogeneity in stem and progenitor potential, but a comprehensive dissection of basal cell dynamics during differentiation is lacking. Using single-cell RNA sequencing coupled with RNAScope and fluorescence lifetime imaging, we identify three non-proliferative and one proliferative basal cell state in homeostatic skin that differ in metabolic preference and become spatially partitioned during wound re-epithelialization. Pseudotemporal trajectory and RNA velocity analyses predict a quasi-linear differentiation hierarchy where basal cells progress from Col17a1Hi/Trp63Hi state to early-response state, proliferate at the juncture of these two states, or become growth arrested before differentiating into spinous cells. Wound healing induces plasticity manifested by dynamic basal-spinous interconversions at multiple basal transcriptional states. Our study provides a systematic view of epidermal cellular dynamics, supporting a revised "hierarchical-lineage" model of homeostasis

eScholarship - University of California

Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.

Author: Abate Adam R
Aran Dvir
Bhattacharya Mallar
Butte Atul J
Chak Suzanna
Fong Valerie
Hsu Austin
Liu Leqian
Looney Agnieszka P
Naikawadi Ram P
Wolters Paul J
Wu Esther
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

Tissue fibrosis is a major cause of mortality that results from the deposition of matrix proteins by an activated mesenchyme. Macrophages accumulate in fibrosis, but the role of specific subgroups in supporting fibrogenesis has not been investigated in vivo. Here, we used single-cell RNA sequencing (scRNA-seq) to characterize the heterogeneity of macrophages in bleomycin-induced lung fibrosis in mice. A novel computational framework for the annotation of scRNA-seq by reference to bulk transcriptomes (SingleR) enabled the subclustering of macrophages and revealed a disease-associated subgroup with a transitional gene expression profile intermediate between monocyte-derived and alveolar macrophages. These CX3CR1+SiglecF+ transitional macrophages localized to the fibrotic niche and had a profibrotic effect in vivo. Human orthologs of genes expressed by the transitional macrophages were upregulated in samples from patients with idiopathic pulmonary fibrosis. Thus, we have identified a pathological subgroup of transitional macrophages that are required for the fibrotic response to injury

Crossref

eScholarship - University of California

Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum.

Author: Bunnik Evelien M
Le Roch Karine G
Lonardi Stefano
Lu Xueqing Maggie
Nasseri Sara
Pokhriyal Neeti
Publication venue: eScholarship, University of California
Publication date: 01/11/2015
Field of study

BackgroundPlasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5'- and 3'-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum.ResultsUsing binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47%) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6%). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated.ConclusionOur results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq)

Springer - Publisher Connector

PubMed Central

eScholarship - University of California