Statistical machine learning methods, especially nonparametric Bayesian
methods, have become increasingly popular to infer clonal population structure
of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant
process (CRP), a popular construction used in nonparametric mixture models, to
infer the phylogeny and genotype of major subclonal lineages represented in the
population of cancer cells. We also propose new split-merge updates tailored to
the subclonal reconstruction problem that improve the mixing time of Markov
chains. In comparisons with the tree-structured stick breaking prior used in
PhyloSub, we demonstrate superior mixing and running time using the treeCRP
with our new split-merge procedures. We also show that given the same number of
samples, TSSB and treeCRP have similar ability to recover the subclonal
structure of a tumor.Comment: Preprint of an article submitted for consideration in the Pacific
Symposium on Biocomputing \c{opyright} 2015; World Scientific Publishing Co.,
Singapore, 2015; http://psb.stanford.edu