6,327 research outputs found
Fast Non-Parametric Bayesian Inference on Infinite Trees
Given i.i.d. data from an unknown distribution, we consider the problem of
predicting future items. An adaptive way to estimate the probability density is
to recursively subdivide the domain to an appropriate data-dependent
granularity. A Bayesian would assign a data-independent prior probability to
"subdivide", which leads to a prior over infinite(ly many) trees. We derive an
exact, fast, and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model dimension, and other
quantities.Comment: 8 twocolumn pages, 3 figure
Fast non-parametric Bayesian inference on infinite trees
Given i.i.d. data from an unknown distribution,
we consider the problem of predicting future items.
An adaptive way to estimate the probability density
is to recursively subdivide the domain to an appropriate
data-dependent granularity. A Bayesian would assign a
data-independent prior probability to "subdivide", which leads
to a prior over infinite(ly many) trees. We derive an exact, fast,
and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model
dimension, and other quantities
Syntactic Topic Models
The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency. Each
element in the dependency parse tree also has a distribution over the topics of
its children, as in latent-state syntax models, which provides the syntactic
consistency. These distributions are convolved so that the topic of each word
is likely under both its document and syntactic context. We derive a fast
posterior inference algorithm based on variational methods. We report
qualitative and quantitative studies on both synthetic data and hand-parsed
documents. We show that the STM is a more predictive model of language than
current models based only on syntax or only on topics
Exact Non-Parametric Bayesian Inference on Infinite Trees
Given i.i.d. data from an unknown distribution, we consider the problem of
predicting future items. An adaptive way to estimate the probability density is
to recursively subdivide the domain to an appropriate data-dependent
granularity. A Bayesian would assign a data-independent prior probability to
"subdivide", which leads to a prior over infinite(ly many) trees. We derive an
exact, fast, and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model dimension, moments,
and other quantities. We prove asymptotic convergence and consistency results,
and illustrate the behavior of our model on some prototypical functions.Comment: 32 LaTeX pages, 9 figures, 5 theorems, 1 algorith
Populations in statistical genetic modelling and inference
What is a population? This review considers how a population may be defined
in terms of understanding the structure of the underlying genetics of the
individuals involved. The main approach is to consider statistically
identifiable groups of randomly mating individuals, which is well defined in
theory for any type of (sexual) organism. We discuss generative models using
drift, admixture and spatial structure, and the ancestral recombination graph.
These are contrasted with statistical models for inference, principle component
analysis and other `non-parametric' methods. The relationships between these
approaches are explored with both simulated and real-data examples. The
state-of-the-art practical software tools are discussed and contrasted. We
conclude that populations are a useful theoretical construct that can be well
defined in theory and often approximately exist in practice
- …