12,858 research outputs found
A Bayesian Nonparametric Markovian Model for Nonstationary Time Series
Stationary time series models built from parametric distributions are, in
general, limited in scope due to the assumptions imposed on the residual
distribution and autoregression relationship. We present a modeling approach
for univariate time series data, which makes no assumptions of stationarity,
and can accommodate complex dynamics and capture nonstandard distributions. The
model for the transition density arises from the conditional distribution
implied by a Bayesian nonparametric mixture of bivariate normals. This implies
a flexible autoregressive form for the conditional transition density, defining
a time-homogeneous, nonstationary, Markovian model for real-valued data indexed
in discrete-time. To obtain a more computationally tractable algorithm for
posterior inference, we utilize a square-root-free Cholesky decomposition of
the mixture kernel covariance matrix. Results from simulated data suggest the
model is able to recover challenging transition and predictive densities. We
also illustrate the model on time intervals between eruptions of the Old
Faithful geyser. Extensions to accommodate higher order structure and to
develop a state-space model are also discussed
EMMIX-uskew: An R Package for Fitting Mixtures of Multivariate Skew t-distributions via the EM Algorithm
This paper describes an algorithm for fitting finite mixtures of unrestricted
Multivariate Skew t (FM-uMST) distributions. The package EMMIX-uskew implements
a closed-form expectation-maximization (EM) algorithm for computing the maximum
likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model
in R. EMMIX-uskew also supports visualization of fitted contours in two and
three dimensions, and random sample generation from a specified FM-uMST
distribution.
Finite mixtures of skew t-distributions have proven to be useful in modelling
heterogeneous data with asymmetric and heavy tail behaviour, for example,
datasets from flow cytometry. In recent years, various versions of mixtures
with multivariate skew t (MST) distributions have been proposed. However, these
models adopted some restricted characterizations of the component MST
distributions so that the E-step of the EM algorithm can be evaluated in closed
form. This paper focuses on mixtures with unrestricted MST components, and
describes an iterative algorithm for the computation of the ML estimates of its
model parameters.
The usefulness of the proposed algorithm is demonstrated in three
applications to real data sets. The first example illustrates the use of the
main function fmmst in the package by fitting a MST distribution to a bivariate
unimodal flow cytometric sample. The second example fits a mixture of MST
distributions to the Australian Institute of Sport (AIS) data, and demonstrate
that EMMIX-uskew can provide better clustering results than mixtures with
restricted MST components. In the third example, EMMIX-uskew is applied to
classify cells in a trivariate flow cytometric dataset. Comparisons with other
available methods suggests that the EMMIX-uskew result achieved a lower
misclassification rate with respect to the labels given by benchmark gating
analysis
Estimating Local Function Complexity via Mixture of Gaussian Processes
Real world data often exhibit inhomogeneity, e.g., the noise level, the
sampling distribution or the complexity of the target function may change over
the input space. In this paper, we try to isolate local function complexity in
a practical, robust way. This is achieved by first estimating the locally
optimal kernel bandwidth as a functional relationship. Specifically, we propose
Spatially Adaptive Bandwidth Estimation in Regression (SABER), which employs
the mixture of experts consisting of multinomial kernel logistic regression as
a gate and Gaussian process regression models as experts. Using the locally
optimal kernel bandwidths, we deduce an estimate to the local function
complexity by drawing parallels to the theory of locally linear smoothing. We
demonstrate the usefulness of local function complexity for model
interpretation and active learning in quantum chemistry experiments and fluid
dynamics simulations.Comment: 19 pages, 16 figure
Flexible modelling in statistics: past, present and future
In times where more and more data become available and where the data exhibit
rather complex structures (significant departure from symmetry, heavy or light
tails), flexible modelling has become an essential task for statisticians as
well as researchers and practitioners from domains such as economics, finance
or environmental sciences. This is reflected by the wealth of existing
proposals for flexible distributions; well-known examples are Azzalini's
skew-normal, Tukey's -and-, mixture and two-piece distributions, to cite
but these. My aim in the present paper is to provide an introduction to this
research field, intended to be useful both for novices and professionals of the
domain. After a description of the research stream itself, I will narrate the
gripping history of flexible modelling, starring emblematic heroes from the
past such as Edgeworth and Pearson, then depict three of the most used flexible
families of distributions, and finally provide an outlook on future flexible
modelling research by posing challenging open questions.Comment: 27 pages, 4 figure
Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data
In systems biomedicine, an experimenter encounters different potential
sources of variation in data such as individual samples, multiple experimental
conditions, and multi-variable network-level responses. In multiparametric
cytometry, which is often used for analyzing patient samples, such issues are
critical. While computational methods can identify cell populations in
individual samples, without the ability to automatically match them across
samples, it is difficult to compare and characterize the populations in typical
experiments, such as those responding to various stimulations or distinctive of
particular patients or time-points, especially when there are many samples.
Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous
modeling and registration of populations across a cohort. JCM models every
population with a robust multivariate probability distribution. Simultaneously,
JCM fits a random-effects model to construct an overall batch template -- used
for registering populations across samples, and classifying new samples. By
tackling systems-level variation, JCM supports practical biomedical
applications involving large cohorts
- …