67,077 research outputs found
Bayesian Clustering by Dynamics
This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase efÂŁciency, the method uses an entropy-based heuristic search strategy. A controlled experiment suggests that the method is very accurate when applied to artificial time series in a broad range of conditions and, when applied to clustering sensor data from mobile robots, it produces clusters that are meaningful in the domain of application
A temporal switch model for estimating transcriptional activity in gene expression
Motivation: The analysis and mechanistic modelling of time series gene expression data provided by techniques such as microarrays, NanoString, reverse transcription–polymerase chain reaction and advanced sequencing are invaluable for developing an understanding of the variation in key biological processes. We address this by proposing the estimation of a flexible dynamic model, which decouples temporal synthesis and degradation of mRNA and, hence, allows for transcriptional activity to switch between different states.
Results: The model is flexible enough to capture a variety of observed transcriptional dynamics, including oscillatory behaviour, in a way that is compatible with the demands imposed by the quality, time-resolution and quantity of the data. We show that the timing and number of switch events in transcriptional activity can be estimated alongside individual gene mRNA stability with the help of a Bayesian reversible jump Markov chain Monte Carlo algorithm. To demonstrate the methodology, we focus on modelling the wild-type behaviour of a selection of 200 circadian genes of the model plant Arabidopsis thaliana. The results support the idea that using a mechanistic model to identify transcriptional switch points is likely to strongly contribute to efforts in elucidating and understanding key biological processes, such as transcription and degradation
Scaling Nonparametric Bayesian Inference via Subsample-Annealing
We describe an adaptation of the simulated annealing algorithm to
nonparametric clustering and related probabilistic models. This new algorithm
learns nonparametric latent structure over a growing and constantly churning
subsample of training data, where the portion of data subsampled can be
interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs
sampling at high temperature (i.e., with a very small subsample) can more
quickly explore sketches of the final latent state by (a) making longer jumps
around latent space (as in block Gibbs) and (b) lowering energy barriers (as in
simulated annealing). We prove subsample annealing speeds up mixing time N^2 ->
N in a simple clustering model and exp(N) -> N in another class of models,
where N is data size. Empirically subsample-annealing outperforms naive Gibbs
sampling in accuracy-per-wallclock time, and can scale to larger datasets and
deeper hierarchical models. We demonstrate improved inference on million-row
subsamples of US Census data and network log data and a 307-row hospital rating
dataset, using a Pitman-Yor generalization of the Cross Categorization model.Comment: To appear in AISTATS 201
- …