Tensor factorizations (TF) are powerful tools for the efficient
representation and analysis of multidimensional data. However, classic TF
methods based on maximum likelihood estimation underperform when applied to
zero-inflated count data, such as single-cell RNA sequencing (scRNA-seq) data.
Additionally, the stochasticity inherent in TFs results in factors that vary
across repeated runs, making interpretation and reproducibility of the results
challenging. In this paper, we introduce Zero Inflated Poisson Tensor
Factorization (ZIPTF), a novel approach for the factorization of
high-dimensional count data with excess zeros. To address the challenge of
stochasticity, we introduce Consensus Zero Inflated Poisson Tensor
Factorization (C-ZIPTF), which combines ZIPTF with a consensus-based
meta-analysis. We evaluate our proposed ZIPTF and C-ZIPTF on synthetic
zero-inflated count data and synthetic and real scRNA-seq data. ZIPTF
consistently outperforms baseline matrix and tensor factorization methods in
terms of reconstruction accuracy for zero-inflated data. When the probability
of excess zeros is high, ZIPTF achieves up to 2.4× better accuracy.
Additionally, C-ZIPTF significantly improves the consistency and accuracy of
the factorization. When tested on both synthetic and real scRNA-seq data, ZIPTF
and C-ZIPTF consistently recover known and biologically meaningful gene
expression programs