4,320 research outputs found
Probabilistic Clustering of Time-Evolving Distance Data
We present a novel probabilistic clustering model for objects that are
represented via pairwise distances and observed at different time points. The
proposed method utilizes the information given by adjacent time points to find
the underlying cluster structure and obtain a smooth cluster evolution. This
approach allows the number of objects and clusters to differ at every time
point, and no identification on the identities of the objects is needed.
Further, the model does not require the number of clusters being specified in
advance -- they are instead determined automatically using a Dirichlet process
prior. We validate our model on synthetic data showing that the proposed method
is more accurate than state-of-the-art clustering methods. Finally, we use our
dynamic clustering model to analyze and illustrate the evolution of brain
cancer patients over time
Bayesian statistics and modelling
Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events. This Primer describes the stages involved in Bayesian analysis, from specifying the prior and data models to deriving inference, model checking and refinement. We discuss the importance of prior and posterior predictive checking, selecting a proper technique for sampling from a posterior distribution, variational inference and variable selection. Examples of successful applications of Bayesian analysis across various research fields are provided, including in social sciences, ecology, genetics, medicine and more. We propose strategies for reproducibility and reporting standards, outlining an updated WAMBS (when to Worry and how to Avoid the Misuse of Bayesian Statistics) checklist. Finally, we outline the impact of Bayesian analysis on artificial intelligence, a major goal in the next decade
Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories
Phylodynamics is an area of population genetics that uses genetic sequence
data to estimate past population dynamics. Modern state-of-the-art Bayesian
nonparametric methods for recovering population size trajectories of unknown
form use either change-point models or Gaussian process priors. Change-point
models suffer from computational issues when the number of change-points is
unknown and needs to be estimated. Gaussian process-based methods lack local
adaptivity and cannot accurately recover trajectories that exhibit features
such as abrupt changes in trend or varying levels of smoothness. We propose a
novel, locally-adaptive approach to Bayesian nonparametric phylodynamic
inference that has the flexibility to accommodate a large class of functional
behaviors. Local adaptivity results from modeling the log-transformed effective
population size a priori as a horseshoe Markov random field, a recently
proposed statistical model that blends together the best properties of the
change-point and Gaussian process modeling paradigms. We use simulated data to
assess model performance, and find that our proposed method results in reduced
bias and increased precision when compared to contemporary methods. We also use
our models to reconstruct past changes in genetic diversity of human hepatitis
C virus in Egypt and to estimate population size changes of ancient and modern
steppe bison. These analyses show that our new method captures features of the
population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio
- …