5,289 research outputs found
Dynamic density estimation with diffusive Dirichlet mixtures
We introduce a new class of nonparametric prior distributions on the space of
continuously varying densities, induced by Dirichlet process mixtures which
diffuse in time. These select time-indexed random functions without jumps,
whose sections are continuous or discrete distributions depending on the choice
of kernel. The construction exploits the widely used stick-breaking
representation of the Dirichlet process and induces the time dependence by
replacing the stick-breaking components with one-dimensional Wright-Fisher
diffusions. These features combine appealing properties of the model, inherited
from the Wright-Fisher diffusions and the Dirichlet mixture structure, with
great flexibility and tractability for posterior computation. The construction
can be easily extended to multi-parameter GEM marginal states, which include,
for example, the Pitman--Yor process. A full inferential strategy is detailed
and illustrated on simulated and real data.Comment: Published at http://dx.doi.org/10.3150/14-BEJ681 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Mixed Marginal Copula Modeling
This article extends the literature on copulas with discrete or continuous
marginals to the case where some of the marginals are a mixture of discrete and
continuous components. We do so by carefully defining the likelihood as the
density of the observations with respect to a mixed measure. The treatment is
quite general, although we focus focus on mixtures of Gaussian and Archimedean
copulas. The inference is Bayesian with the estimation carried out by Markov
chain Monte Carlo. We illustrate the methodology and algorithms by applying
them to estimate a multivariate income dynamics model.Comment: 46 pages, 8 tables and 4 figure
Identifiability of parameters in latent structure models with many observed variables
While hidden class models of various types arise in many statistical
applications, it is often difficult to establish the identifiability of their
parameters. Focusing on models in which there is some structure of independence
of some of the observed variables conditioned on hidden ones, we demonstrate a
general approach for establishing identifiability utilizing algebraic
arguments. A theorem of J. Kruskal for a simple latent-class model with finite
state space lies at the core of our results, though we apply it to a diverse
set of models. These include mixtures of both finite and nonparametric product
distributions, hidden Markov models and random graph mixture models, and lead
to a number of new results and improvements to old ones. In the parametric
setting, this approach indicates that for such models, the classical definition
of identifiability is typically too strong. Instead generic identifiability
holds, which implies that the set of nonidentifiable parameters has measure
zero, so that parameter inference is still meaningful. In particular, this
sheds light on the properties of finite mixtures of Bernoulli products, which
have been used for decades despite being known to have nonidentifiable
parameters. In the nonparametric setting, we again obtain identifiability only
when certain restrictions are placed on the distributions that are mixed, but
we explicitly describe the conditions.Comment: Published in at http://dx.doi.org/10.1214/09-AOS689 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Kernel Belief Propagation
We propose a nonparametric generalization of belief propagation, Kernel
Belief Propagation (KBP), for pairwise Markov random fields. Messages are
represented as functions in a reproducing kernel Hilbert space (RKHS), and
message updates are simple linear operations in the RKHS. KBP makes none of the
assumptions commonly required in classical BP algorithms: the variables need
not arise from a finite domain or a Gaussian distribution, nor must their
relations take any particular parametric form. Rather, the relations between
variables are represented implicitly, and are learned nonparametrically from
training data. KBP has the advantage that it may be used on any domain where
kernels are defined (Rd, strings, groups), even where explicit parametric
models are not known, or closed form expressions for the BP updates do not
exist. The computational cost of message updates in KBP is polynomial in the
training data size. We also propose a constant time approximate message update
procedure by representing messages using a small number of basis functions. In
experiments, we apply KBP to image denoising, depth prediction from still
images, and protein configuration prediction: KBP is faster than competing
classical and nonparametric approaches (by orders of magnitude, in some cases),
while providing significantly more accurate results
Multivariate type G Mat\'ern stochastic partial differential equation random fields
For many applications with multivariate data, random field models capturing
departures from Gaussianity within realisations are appropriate. For this
reason, we formulate a new class of multivariate non-Gaussian models based on
systems of stochastic partial differential equations with additive type G noise
whose marginal covariance functions are of Mat\'ern type. We consider four
increasingly flexible constructions of the noise, where the first two are
similar to existing copula-based models. In contrast to these, the latter two
constructions can model non-Gaussian spatial data without replicates.
Computationally efficient methods for likelihood-based parameter estimation and
probabilistic prediction are proposed, and the flexibility of the suggested
models is illustrated by numerical examples and two statistical applications
- …