5,553 research outputs found
Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization
Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that
has stimulated enormous interest in statistics, data science, and computational
biology due to the high dimensionality, complexity, and large scale associated
with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique
approach due to its meta-gene interpretation of resulting low-dimensional
components. However, NMF approaches suffer from the lack of multiscale
analysis. This work introduces two persistent Laplacian regularized NMF
methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By
employing a total of 12 datasets, we demonstrate that the proposed TNMF and
rTNMF significantly outperform all other NMF-based methods. We have also
utilized TNMF and rTNMF for the visualization of popular Uniform Manifold
Approximation and Projection (UMAP) and t-distributed stochastic neighbor
embedding (t-SNE)
Control of VEGF-A transcriptional programs by pausing and genomic compartmentalization.
Vascular endothelial growth factor A (VEGF-A) is a master regulator of angiogenesis, vascular development and function. In this study we investigated the transcriptional regulation of VEGF-A-responsive genes in primary human aortic endothelial cells (HAECs) and human umbilical vein endothelial cells (HUVECs) using genome-wide global run-on sequencing (GRO-Seq). We demonstrate that half of VEGF-A-regulated gene promoters are characterized by a transcriptionally competent paused RNA polymerase II (Pol II). We show that transition into productive elongation is a major mechanism of gene activation of virtually all VEGF-regulated genes, whereas only ∼40% of the genes are induced at the level of initiation. In addition, we report a comprehensive chromatin interaction map generated in HUVECs using tethered conformation capture (TCC) and characterize chromatin interactions in relation to transcriptional activity. We demonstrate that sites of active transcription are more likely to engage in chromatin looping and cell type-specific transcriptional activity reflects the boundaries of chromatin interactions. Furthermore, we identify large chromatin compartments with a tendency to be coordinately transcribed upon VEGF-A stimulation. We provide evidence that these compartments are enriched for clusters of regulatory regions such as super-enhancers and for disease-associated single nucleotide polymorphisms (SNPs). Collectively, these findings provide new insights into mechanisms behind VEGF-A-regulated transcriptional programs in endothelial cells
Recommended from our members
Modeling the Transcriptional Landscape of in vitro Neuronal Differentiation and ALS Disease
The spinal cord is a complex structure responsible for processing sensory inputs and motor outputs. As such, the developmental and spatial organization of cells is highly organized. Diseases affecting the spinal cord, such as Amyotrophic Lateral Sclerosis (ALS), result in the disruption of normal cellular function and intercellular interactions, culminating in neurodegeneration. Deciphering disease mechanisms requires a fundamental understanding of both the normal development of cells within the spinal cord as well as the homeostatic environment that allows for proper function. Biological processes such as cellular differentiation, maturation, and disease progression proceed in an asynchronous and cell type-specific manner. Until recently, bulk measurements of a mixed population of cells have been key in understanding these events. However, bulk measurements can obscure the molecular mechanisms involved in branched or coinciding processes, such as differential transcriptional responses occurring between subpopulations of cells. Measurements in individual cells have largely been restricted to 4 color immunofluorescence assays, which provide a solid but limited view of molecular-level changes. Recently, developments in single cell RNA-sequencing (scRNA-Seq) have provided an avenue of accurately profiling the RNA expression levels of thousands of genes concomitantly in an individual cell. With this increased experimental precision comes the ability to explore pathways that are differentially activated in subpopulations of cells, and to determine the transcriptional programs that underlie complex biological processes. In this dissertation, I will first review the key features of scRNA-Seq and downstream analysis. I will then discuss two applications of scRNA-seq: 1) the in vitro differentiation of mouse embryonic stem cells into motor neurons, and 2) the effect of the ALS-associated gene SOD1G93A expression on cultured motor neurons in a cellular model of ALS
K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity
in cells, which has given us insights into cell-cell communication, cell
differentiation, and differential gene expression. However, analyzing scRNA-seq
data is a challenge due to sparsity and the large number of genes involved.
Therefore, dimensionality reduction and feature selection are important for
removing spurious signals and enhancing downstream analysis. Traditional PCA, a
main workhorse in dimensionality reduction, lacks the ability to capture
geometrical structure information embedded in the data, and previous graph
Laplacian regularizations are limited by the analysis of only a single scale.
We propose a topological Principal Components Analysis (tPCA) method by the
combination of persistent Laplacian (PL) technique and L norm
regularization to address multiscale and multiclass heterogeneity issues in
data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian
technique to improve the robustness of our persistent Laplacian method. The
proposed kNN-PL is a new algebraic topology technique which addresses the many
limitations of the traditional persistent homology. Rather than inducing
filtration via the varying of a distance threshold, we introduced kNN-tPCA,
where filtrations are achieved by varying the number of neighbors in a kNN
network at each step, and find that this framework has significant implications
for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and
kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that
our methods outperform other unsupervised PCA enhancements from the literature,
as well as popular Uniform Manifold Approximation (UMAP), t-Distributed
Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix
Factorization (NMF) by significant margins.Comment: 28 pages, 11 figure
Methods for Joint Normalization and Comparison of Hi-C data
The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions.
We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/).
We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)
Recommended from our members
Revealing Dynamic Mechanisms of Cell Fate Decisions From Single-Cell Transcriptomic Data.
Cell fate decisions play a pivotal role in development, but technologies for dissecting them are limited. We developed a multifunction new method, Topographer, to construct a "quantitative" Waddington's landscape of single-cell transcriptomic data. This method is able to identify complex cell-state transition trajectories and to estimate complex cell-type dynamics characterized by fate and transition probabilities. It also infers both marker gene networks and their dynamic changes as well as dynamic characteristics of transcriptional bursting along the cell-state transition trajectories. Applying this method to single-cell RNA-seq data on the differentiation of primary human myoblasts, we not only identified three known cell types, but also estimated both their fate probabilities and transition probabilities among them. We found that the percent of genes expressed in a bursty manner is significantly higher at (or near) the branch point (~97%) than before or after branch (below 80%), and that both gene-gene and cell-cell correlation degrees are apparently lower near the branch point than away from the branching. Topographer allows revealing of cell fate mechanisms in a coherent way at three scales: cell lineage (macroscopic), gene network (mesoscopic), and gene expression (microscopic)
Differentiable Mapper For Topological Optimization Of Data Representation
Unsupervised data representation and visualization using tools from topology
is an active and growing field of Topological Data Analysis (TDA) and data
science. Its most prominent line of work is based on the so-called Mapper
graph, which is a combinatorial graph whose topological structures (connected
components, branches, loops) are in correspondence with those of the data
itself. While highly generic and applicable, its use has been hampered so far
by the manual tuning of its many parameters-among these, a crucial one is the
so-called filter: it is a continuous function whose variations on the data set
are the main ingredient for both building the Mapper representation and
assessing the presence and sizes of its topological structures. However, while
a few parameter tuning methods have already been investigated for the other
Mapper parameters (i.e., resolution, gain, clustering), there is currently no
method for tuning the filter itself. In this work, we build on a recently
proposed optimization framework incorporating topology to provide the first
filter optimization scheme for Mapper graphs. In order to achieve this, we
propose a relaxed and more general version of the Mapper graph, whose
convergence properties are investigated. Finally, we demonstrate the usefulness
of our approach by optimizing Mapper graph representations on several datasets,
and showcasing the superiority of the optimized representation over arbitrary
ones
Minor Loops in Major Folds: Enhancer-Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease.
The organization and folding of chromatin within the nucleus can determine the outcome of gene expression. Recent technological advancements have enabled us to study chromatin interactions in a genome-wide manner at high resolution. These studies have increased our understanding of the hierarchy and dynamics of chromatin domains that facilitate cognate enhancer-promoter looping, defining the transcriptional program of different cell types. In this review, we focus on vertebrate chromatin long-range interactions as they relate to transcriptional regulation. In addition, we describe how the alteration of boundaries that mark discrete regions in the genome with high interaction frequencies within them, called topological associated domains (TADs), could lead to various phenotypes, including human diseases, which we term as "TADopathies.
- …