4,134 research outputs found
Infinite factorization of multiple non-parametric views
Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering
setting, by introducing a novel non-parametric hierarchical
mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block
model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views.
Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation
Robust Bayesian Tensor Factorization with Zero-Inflated Poisson Model and Consensus Aggregation
Tensor factorizations (TF) are powerful tools for the efficient
representation and analysis of multidimensional data. However, classic TF
methods based on maximum likelihood estimation underperform when applied to
zero-inflated count data, such as single-cell RNA sequencing (scRNA-seq) data.
Additionally, the stochasticity inherent in TFs results in factors that vary
across repeated runs, making interpretation and reproducibility of the results
challenging. In this paper, we introduce Zero Inflated Poisson Tensor
Factorization (ZIPTF), a novel approach for the factorization of
high-dimensional count data with excess zeros. To address the challenge of
stochasticity, we introduce Consensus Zero Inflated Poisson Tensor
Factorization (C-ZIPTF), which combines ZIPTF with a consensus-based
meta-analysis. We evaluate our proposed ZIPTF and C-ZIPTF on synthetic
zero-inflated count data and synthetic and real scRNA-seq data. ZIPTF
consistently outperforms baseline matrix and tensor factorization methods in
terms of reconstruction accuracy for zero-inflated data. When the probability
of excess zeros is high, ZIPTF achieves up to better accuracy.
Additionally, C-ZIPTF significantly improves the consistency and accuracy of
the factorization. When tested on both synthetic and real scRNA-seq data, ZIPTF
and C-ZIPTF consistently recover known and biologically meaningful gene
expression programs
Deep Exponential Families
We describe \textit{deep exponential families} (DEFs), a class of latent
variable models that are inspired by the hidden structures used in deep neural
networks. DEFs capture a hierarchy of dependencies between latent variables,
and are easily generalized to many settings through exponential families. We
perform inference using recent "black box" variational inference techniques. We
then evaluate various DEFs on text and combine multiple DEFs into a model for
pairwise recommendation data. In an extensive study, we show that going beyond
one layer improves predictions for DEFs. We demonstrate that DEFs find
interesting exploratory structure in large data sets, and give better
predictive performance than state-of-the-art models
Deep generative modeling for single-cell transcriptomics.
Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task
- …