Search CORE

5,553 research outputs found

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization

Author: Hozumi Yuta
Wei Guo-Wei
Publication venue
Publication date: 24/10/2023
Field of study

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE)

arXiv.org e-Print Archive

Control of VEGF-A transcriptional programs by pausing and genomic compartmentalization.

Author: Benner Christopher
Glass Christopher K
Heinz Sven
Kaikkonen Minna U
Kansanen Emilia
Kivelä Annukka M
Laitalainen Jarkko
Niskanen Henri
Romanoski Casey E
Ylä-Herttuala Seppo
Publication venue: eScholarship, University of California
Publication date: 28/10/2014
Field of study

Vascular endothelial growth factor A (VEGF-A) is a master regulator of angiogenesis, vascular development and function. In this study we investigated the transcriptional regulation of VEGF-A-responsive genes in primary human aortic endothelial cells (HAECs) and human umbilical vein endothelial cells (HUVECs) using genome-wide global run-on sequencing (GRO-Seq). We demonstrate that half of VEGF-A-regulated gene promoters are characterized by a transcriptionally competent paused RNA polymerase II (Pol II). We show that transition into productive elongation is a major mechanism of gene activation of virtually all VEGF-regulated genes, whereas only ∼40% of the genes are induced at the level of initiation. In addition, we report a comprehensive chromatin interaction map generated in HUVECs using tethered conformation capture (TCC) and characterize chromatin interactions in relation to transcriptional activity. We demonstrate that sites of active transcription are more likely to engage in chromatin looping and cell type-specific transcriptional activity reflects the boundaries of chromatin interactions. Furthermore, we identify large chromatin compartments with a tendency to be coordinately transcribed upon VEGF-A stimulation. We provide evidence that these compartments are enriched for clusters of regulatory regions such as super-enhancers and for disease-associated single nucleotide polymorphisms (SNPs). Collectively, these findings provide new insights into mechanisms behind VEGF-A-regulated transcriptional programs in endothelial cells

PubMed Central

eScholarship - University of California

Recommended from our members

Modeling the Transcriptional Landscape of in vitro Neuronal Differentiation and ALS Disease

Author: Kandror Elena
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The spinal cord is a complex structure responsible for processing sensory inputs and motor outputs. As such, the developmental and spatial organization of cells is highly organized. Diseases affecting the spinal cord, such as Amyotrophic Lateral Sclerosis (ALS), result in the disruption of normal cellular function and intercellular interactions, culminating in neurodegeneration. Deciphering disease mechanisms requires a fundamental understanding of both the normal development of cells within the spinal cord as well as the homeostatic environment that allows for proper function. Biological processes such as cellular differentiation, maturation, and disease progression proceed in an asynchronous and cell type-specific manner. Until recently, bulk measurements of a mixed population of cells have been key in understanding these events. However, bulk measurements can obscure the molecular mechanisms involved in branched or coinciding processes, such as differential transcriptional responses occurring between subpopulations of cells. Measurements in individual cells have largely been restricted to 4 color immunofluorescence assays, which provide a solid but limited view of molecular-level changes. Recently, developments in single cell RNA-sequencing (scRNA-Seq) have provided an avenue of accurately profiling the RNA expression levels of thousands of genes concomitantly in an individual cell. With this increased experimental precision comes the ability to explore pathways that are differentially activated in subpopulations of cells, and to determine the transcriptional programs that underlie complex biological processes. In this dissertation, I will first review the key features of scRNA-Seq and downstream analysis. I will then discuss two applications of scRNA-seq: 1) the in vitro differentiation of mouse embryonic stem cells into motor neurons, and 2) the effect of the ALS-associated gene SOD1G93A expression on cultured motor neurons in a cellular model of ALS

Columbia University Academic Commons

K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Author: Cottrell Sean
Hozumi Yuta
Wei Guo-Wei
Publication venue
Publication date: 22/10/2023
Field of study

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L

_{2,1}

norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.Comment: 28 pages, 11 figure

arXiv.org e-Print Archive

Methods for Joint Normalization and Comparison of Hi-C data

Author: Stansfield John C
Publication venue: VCU Scholars Compass
Publication date: 01/01/2019
Field of study

The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions. We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/). We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)

VCU Scholars Compass

Recommended from our members

Revealing Dynamic Mechanisms of Cell Fate Decisions From Single-Cell Transcriptomic Data.

Author: Nie Qing
Zhang Jiajun
Zhou Tianshou
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Cell fate decisions play a pivotal role in development, but technologies for dissecting them are limited. We developed a multifunction new method, Topographer, to construct a "quantitative" Waddington's landscape of single-cell transcriptomic data. This method is able to identify complex cell-state transition trajectories and to estimate complex cell-type dynamics characterized by fate and transition probabilities. It also infers both marker gene networks and their dynamic changes as well as dynamic characteristics of transcriptional bursting along the cell-state transition trajectories. Applying this method to single-cell RNA-seq data on the differentiation of primary human myoblasts, we not only identified three known cell types, but also estimated both their fate probabilities and transition probabilities among them. We found that the percent of genes expressed in a bursty manner is significantly higher at (or near) the branch point (~97%) than before or after branch (below 80%), and that both gene-gene and cell-cell correlation degrees are apparently lower near the branch point than away from the branching. Topographer allows revealing of cell fate mechanisms in a coherent way at three scales: cell lineage (macroscopic), gene network (mesoscopic), and gene expression (microscopic)

eScholarship - University of California

Differentiable Mapper For Topological Optimization Of Data Representation

Author: Carrière Mathieu
Michel Bertrand
Oulhaj Ziyad
Publication venue
Publication date: 20/02/2024
Field of study

Unsupervised data representation and visualization using tools from topology is an active and growing field of Topological Data Analysis (TDA) and data science. Its most prominent line of work is based on the so-called Mapper graph, which is a combinatorial graph whose topological structures (connected components, branches, loops) are in correspondence with those of the data itself. While highly generic and applicable, its use has been hampered so far by the manual tuning of its many parameters-among these, a crucial one is the so-called filter: it is a continuous function whose variations on the data set are the main ingredient for both building the Mapper representation and assessing the presence and sizes of its topological structures. However, while a few parameter tuning methods have already been investigated for the other Mapper parameters (i.e., resolution, gain, clustering), there is currently no method for tuning the filter itself. In this work, we build on a recently proposed optimization framework incorporating topology to provide the first filter optimization scheme for Mapper graphs. In order to achieve this, we propose a relaxed and more general version of the Mapper graph, whose convergence properties are investigated. Finally, we demonstrate the usefulness of our approach by optimizing Mapper graph representations on several datasets, and showcasing the superiority of the optimized representation over arbitrary ones

arXiv.org e-Print Archive

Minor Loops in Major Folds: Enhancer-Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease.

Author: Ahituv Nadav
Matharu Navneet
Publication venue: eScholarship, University of California
Publication date: 01/12/2015
Field of study

The organization and folding of chromatin within the nucleus can determine the outcome of gene expression. Recent technological advancements have enabled us to study chromatin interactions in a genome-wide manner at high resolution. These studies have increased our understanding of the hierarchy and dynamics of chromatin domains that facilitate cognate enhancer-promoter looping, defining the transcriptional program of different cell types. In this review, we focus on vertebrate chromatin long-range interactions as they relate to transcriptional regulation. In addition, we describe how the alteration of boundaries that mark discrete regions in the genome with high interaction frequencies within them, called topological associated domains (TADs), could lead to various phenotypes, including human diseases, which we term as "TADopathies.

CiteSeerX

Directory of Open Access Journals

PubMed Central

eScholarship - University of California