231 research outputs found
Dynamic Mixture of Finite Mixtures of Factor Analysers with Automatic Inference on the Number of Clusters and Factors
Mixtures of factor analysers (MFA) models represent a popular tool for
finding structure in data, particularly high-dimensional data. While in most
applications the number of clusters, and especially the number of latent
factors within clusters, is mostly fixed in advance, in the recent literature
models with automatic inference on both the number of clusters and latent
factors have been introduced. The automatic inference is usually done by
assigning a nonparametric prior and allowing the number of clusters and factors
to potentially go to infinity. The MCMC estimation is performed via an adaptive
algorithm, in which the parameters associated with the redundant factors are
discarded as the chain moves. While this approach has clear advantages, it also
bears some significant drawbacks. Running a separate factor-analytical model
for each cluster involves matrices of changing dimensions, which can make the
model and programming somewhat cumbersome. In addition, discarding the
parameters associated with the redundant factors could lead to a bias in
estimating cluster covariance matrices. At last, identification remains
problematic for infinite factor models. The current work contributes to the MFA
literature by providing for the automatic inference on the number of clusters
and the number of cluster-specific factors while keeping both cluster and
factor dimensions finite. This allows us to avoid many of the aforementioned
drawbacks of the infinite models. For the automatic inference on the cluster
structure, we employ the dynamic mixture of finite mixtures (MFM) model.
Automatic inference on cluster-specific factors is performed by assigning an
exchangeable shrinkage process (ESP) prior to the columns of the factor loading
matrices. The performance of the model is demonstrated on several benchmark
data sets as well as real data applications
Variational semi-blind sparse deconvolution with orthogonal kernel bases and its application to MRFM
We present a variational Bayesian method of joint image reconstruction and point spread function (PSF) estimation when the PSF of the imaging device is only partially known. To solve this semi-blind deconvolution problem, prior distributions are specified for the PSF and the 3D image. Joint image reconstruction and PSF estimation is then performed within a Bayesian framework, using a variational algorithm to estimate the posterior distribution. The image prior distribution imposes an explicit atomic measure that corresponds to image sparsity. Importantly, the proposed Bayesian deconvolution algorithm does not require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chain Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly outperforms mismatched non-blind algorithms that rely on the assumption of the perfect knowledge of the PSF. The algorithm is illustrated on real data from magnetic resonance force microscopy (MRFM)
Variational semi-blind sparse deconvolution with orthogonal kernel bases and its application to MRFM
We present a variational Bayesian method of joint image reconstruction and point spread function (PSF) estimation when the PSF of the imaging device is only partially known. To solve this semi-blind deconvolution problem, prior distributions are specified for the PSF and the 3D image. Joint image reconstruction and PSF estimation is then performed within a Bayesian framework, using a variational algorithm to estimate the posterior distribution. The image prior distribution imposes an explicit atomic measure that corresponds to image sparsity. Importantly, the proposed Bayesian deconvolution algorithm does not require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chain Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly outperforms mismatched non-blind algorithms that rely on the assumption of the perfect knowledge of the PSF. The algorithm is illustrated on real data from magnetic resonance force microscopy (MRFM)
Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.
BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression
Bayesian Computation of the Intrinsic Structure of Factor Analytic Models
The study of factor analytic models often has to address two important issues: (a) the determination of the âoptimumâ number of factors and (b) the derivation of a unique simple structure whose interpretation is easy and straightforward. The classical approach deals with these two tasks separately, and sometimes resorts to ad-hoc methods. This paper proposes a Bayesian approach to these two important issues, and adapts ideas from stochastic geometry and Bayesian finite mixture modelling to construct an ergodic Markov chain having the posterior distribution of the complete collection of parameters (including the number of factors) as its equilibrium distribution. The proposed method uses an Automatic Relevance Determination (ARD) prior as the device of achieving the desired simple structure. A Gibbs sampler updating scheme is then combined with the simulation of a continuous-time birth-and-death point process to produce a sampling scheme that efficiently explores the posterior distribution of interest. The MCMC sample path obtained from the simulated posterior then provides a flexible ingredient for most of the inferential tasks of interest. Illustrations on both artificial and real tasks are provided, while major difficulties and challenges are discussed, along with ideas for future improvements
Deep mixture of linear mixed models for complex longitudinal data
Mixtures of linear mixed models are widely used for modelling longitudinal
data for which observation times differ between subjects. In typical
applications, temporal trends are described using a basis expansion, with basis
coefficients treated as random effects varying by subject. Additional random
effects can describe variation between mixture components, or other known
sources of variation in complex experimental designs. A key advantage of these
models is that they provide a natural mechanism for clustering, which can be
helpful for interpretation in many applications. Current versions of mixtures
of linear mixed models are not specifically designed for the case where there
are many observations per subject and a complex temporal trend, which requires
a large number of basis functions to capture. In this case, the
subject-specific basis coefficients are a high-dimensional random effects
vector, for which the covariance matrix is hard to specify and estimate,
especially if it varies between mixture components. To address this issue, we
consider the use of recently-developed deep mixture of factor analyzers models
as the prior for the random effects. The resulting deep mixture of linear mixed
models is well-suited to high-dimensional settings, and we describe an
efficient variational inference approach to posterior computation. The efficacy
of the method is demonstrated on both real and simulated data
Modelling multivariate disease rates with a latent structure mixture model
Copyright © 2013 SAGE / Statistical Modeling SocietyThere has been considerable recent interest in multivariate modelling of the geographical distribution of morbidity or mortality rates for potentially related diseases. The motivations for this include investigation of similarities or dissimilarities in the risk distribution for the different diseases, as well as âborrowing strengthâ across disease rates to shrink the uncertainty in geographical risk assessment for any particular disease. A number of approaches to such multivariate modelling have been suggested and this paper proposes an extension to these which may provide a richer range of dependency structures than those encompassed so far. We develop a model which incorporates a discrete mixture of latent structures and argue that this provides potential to represent an enhanced range of correlation structures between diseases at the same time as implicitly allowing for less restrictive spatial correlation structures between geographical units. We compare and contrast our approach to other commonly used multivariate disease models and demonstrate comparative results using data taken from cancer registries on four carcinomas in some 300 geographical units in England, Scotland and Wales
Mixtures of Shifted Asymmetric Laplace Distributions
A mixture of shifted asymmetric Laplace distributions is introduced and used
for clustering and classification. A variant of the EM algorithm is developed
for parameter estimation by exploiting the relationship with the general
inverse Gaussian distribution. This approach is mathematically elegant and
relatively computationally straightforward. Our novel mixture modelling
approach is demonstrated on both simulated and real data to illustrate
clustering and classification applications. In these analyses, our mixture of
shifted asymmetric Laplace distributions performs favourably when compared to
the popular Gaussian approach. This work, which marks an important step in the
non-Gaussian model-based clustering and classification direction, concludes
with discussion as well as suggestions for future work
- âŠ