We perform differential expression analysis of high-throughput sequencing
count data under a Bayesian nonparametric framework, removing sophisticated
ad-hoc pre-processing steps commonly required in existing algorithms. We
propose to use the gamma (beta) negative binomial process, which takes into
account different sequencing depths using sample-specific negative binomial
probability (dispersion) parameters, to detect differentially expressed genes
by comparing the posterior distributions of gene-specific negative binomial
dispersion (probability) parameters. These model parameters are inferred by
borrowing statistical strength across both the genes and samples. Extensive
experiments on both simulated and real-world RNA sequencing count data show
that the proposed differential expression analysis algorithms clearly
outperform previously proposed ones in terms of the areas under both the
receiver operating characteristic and precision-recall curves.Comment: To appear in Journal of the American Statistical Associatio