6,906 research outputs found
Introduction to finite mixtures
Mixture models have been around for over 150 years, as an intuitively simple
and practical tool for enriching the collection of probability distributions
available for modelling data. In this chapter we describe the basic ideas of
the subject, present several alternative representations and perspectives on
these models, and discuss some of the elements of inference about the unknowns
in the models. Our focus is on the simplest set-up, of finite mixture models,
but we discuss also how various simplifying assumptions can be relaxed to
generate the rich landscape of modelling and inference ideas traversed in the
rest of this book.Comment: 14 pages, 7 figures, A chapter prepared for the forthcoming Handbook
of Mixture Analysis. V2 corrects a small but important typographical error,
and makes other minor edits; V3 makes further minor corrections and updates
following review; V4 corrects algorithmic details in sec 4.1 and 4.2, and
removes typo
Colouring and breaking sticks: random distributions and heterogeneous clustering
We begin by reviewing some probabilistic results about the Dirichlet Process
and its close relatives, focussing on their implications for statistical
modelling and analysis. We then introduce a class of simple mixture models in
which clusters are of different `colours', with statistical characteristics
that are constant within colours, but different between colours. Thus cluster
identities are exchangeable only within colours. The basic form of our model is
a variant on the familiar Dirichlet process, and we find that much of the
standard modelling and computational machinery associated with the Dirichlet
process may be readily adapted to our generalisation. The methodology is
illustrated with an application to the partially-parametric clustering of gene
expression profiles.Comment: 26 pages, 3 figures. Chapter 13 of "Probability and Mathematical
Genetics: Papers in Honour of Sir John Kingman" (Editors N.H. Bingham and
C.M. Goldie), Cambridge University Press, 201
A structural Markov property for decomposable graph laws that allows control of clique intersections
We present a new kind of structural Markov property for probabilistic laws on
decomposable graphs, which allows the explicit control of interactions between
cliques, so is capable of encoding some interesting structure. We prove the
equivalence of this property to an exponential family assumption, and discuss
identifiability, modelling, inferential and computational implications.Comment: 10 pages, 3 figures; updated from V1 following journal review, new
more explicit title and added section on inferenc
Sampling decomposable graphs using a Markov chain on junction trees
Full Bayesian computational inference for model determination in undirected
graphical models is currently restricted to decomposable graphs, except for
problems of very small scale. In this paper we develop new, more efficient
methodology for such inference, by making two contributions to the
computational geometry of decomposable graphs. The first of these provides
sufficient conditions under which it is possible to completely connect two
disconnected complete subsets of vertices, or perform the reverse procedure,
yet maintain decomposability of the graph. The second is a new Markov chain
Monte Carlo sampler for arbitrary positive distributions on decomposable
graphs, taking a junction tree representing the graph as its state variable.
The resulting methodology is illustrated with numerical experiments on three
specific models.Comment: 22 pages, 7 figures, 1 table. V2 as V1 except that Fig 1 was
corrected. V3 has significant edits, dropping some figures and including
additional examples and a discussion of the non-decomposable case. V4 is
further edited following review, and includes additional reference
Julian Ernst Besag, 26 March 1945 -- 6 August 2010, a biographical memoir
Julian Besag was an outstanding statistical scientist, distinguished for his
pioneering work on the statistical theory and analysis of spatial processes,
especially conditional lattice systems. His work has been seminal in
statistical developments over the last several decades ranging from image
analysis to Markov chain Monte Carlo methods. He clarified the role of
auto-logistic and auto-normal models as instances of Markov random fields and
paved the way for their use in diverse applications. Later work included
investigations into the efficacy of nearest neighbour models to accommodate
spatial dependence in the analysis of data from agricultural field trials,
image restoration from noisy data, and texture generation using lattice models.Comment: 26 pages, 14 figures; minor revisions, omission of full bibliograph
Sensitivity of inferences in forensic genetics to assumptions about founding genes
Many forensic genetics problems can be handled using structured systems of
discrete variables, for which Bayesian networks offer an appealing practical
modeling framework, and allow inferences to be computed by probability
propagation methods. However, when standard assumptions are violated--for
example, when allele frequencies are unknown, there is identity by descent or
the population is heterogeneous--dependence is generated among founding genes,
that makes exact calculation of conditional probabilities by propagation
methods less straightforward. Here we illustrate different methodologies for
assessing sensitivity to assumptions about founders in forensic genetics
problems. These include constrained steepest descent, linear fractional
programming and representing dependence by structure. We illustrate these
methods on several forensic genetics examples involving criminal
identification, simple and complex disputed paternity and DNA mixtures.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS235 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …