4,208 research outputs found
Inferring clonal evolution of tumors from single nucleotide somatic mutations
High-throughput sequencing allows the detection and quantification of
frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor
cell populations. In some cases, the evolutionary history and population
frequency of the subclonal lineages of tumor cells present in the sample can be
reconstructed from these SNV frequency measurements. However, automated methods
to do this reconstruction are not available and the conditions under which
reconstruction is possible have not been described.
We describe the conditions under which the evolutionary history can be
uniquely reconstructed from SNV frequencies from single or multiple samples
from the tumor population and we introduce a new statistical model, PhyloSub,
that infers the phylogeny and genotype of the major subclonal lineages
represented in the population of cancer cells. It uses a Bayesian nonparametric
prior over trees that groups SNVs into major subclonal lineages and
automatically estimates the number of lineages and their ancestry. We sample
from the joint posterior distribution over trees to identify evolutionary
histories and cell population frequencies that have the highest probability of
generating the observed SNV frequency data. When multiple phylogenies are
consistent with a given set of SNV frequencies, PhyloSub represents the
uncertainty in the tumor phylogeny using a partial order plot. Experiments on a
simulated dataset and two real datasets comprising tumor samples from acute
myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that
PhyloSub can infer both linear (or chain) and branching lineages and its
inferences are in good agreement with ground truth, where it is available
Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
This paper presents a novel algorithm, based upon the dependent Dirichlet
process mixture model (DDPMM), for clustering batch-sequential data containing
an unknown number of evolving clusters. The algorithm is derived via a
low-variance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM,
and provides a hard clustering with convergence guarantees similar to those of
the k-means algorithm. Empirical results from a synthetic test with moving
Gaussian clusters and a test with real ADS-B aircraft trajectory data
demonstrate that the algorithm requires orders of magnitude less computational
time than contemporary probabilistic and hard clustering algorithms, while
providing higher accuracy on the examined datasets.Comment: This paper is from NIPS 2013. Please use the following BibTeX
citation: @inproceedings{Campbell13_NIPS, Author = {Trevor Campbell and Miao
Liu and Brian Kulis and Jonathan P. How and Lawrence Carin}, Title = {Dynamic
Clustering via Asymptotics of the Dependent Dirichlet Process}, Booktitle =
{Advances in Neural Information Processing Systems (NIPS)}, Year = {2013}
Comparing Nonparametric Bayesian Tree Priors for Clonal Reconstruction of Tumors
Statistical machine learning methods, especially nonparametric Bayesian
methods, have become increasingly popular to infer clonal population structure
of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant
process (CRP), a popular construction used in nonparametric mixture models, to
infer the phylogeny and genotype of major subclonal lineages represented in the
population of cancer cells. We also propose new split-merge updates tailored to
the subclonal reconstruction problem that improve the mixing time of Markov
chains. In comparisons with the tree-structured stick breaking prior used in
PhyloSub, we demonstrate superior mixing and running time using the treeCRP
with our new split-merge procedures. We also show that given the same number of
samples, TSSB and treeCRP have similar ability to recover the subclonal
structure of a tumor.Comment: Preprint of an article submitted for consideration in the Pacific
Symposium on Biocomputing \c{opyright} 2015; World Scientific Publishing Co.,
Singapore, 2015; http://psb.stanford.edu
Probabilistic Clustering of Time-Evolving Distance Data
We present a novel probabilistic clustering model for objects that are
represented via pairwise distances and observed at different time points. The
proposed method utilizes the information given by adjacent time points to find
the underlying cluster structure and obtain a smooth cluster evolution. This
approach allows the number of objects and clusters to differ at every time
point, and no identification on the identities of the objects is needed.
Further, the model does not require the number of clusters being specified in
advance -- they are instead determined automatically using a Dirichlet process
prior. We validate our model on synthetic data showing that the proposed method
is more accurate than state-of-the-art clustering methods. Finally, we use our
dynamic clustering model to analyze and illustrate the evolution of brain
cancer patients over time
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
- …