11,914 research outputs found
Gaussian Process Structural Equation Models with Latent Variables
In a variety of disciplines such as social sciences, psychology, medicine and
economics, the recorded data are considered to be noisy measurements of latent
variables connected by some causal structure. This corresponds to a family of
graphical models known as the structural equation model with latent variables.
While linear non-Gaussian variants have been well-studied, inference in
nonparametric structural equation models is still underdeveloped. We introduce
a sparse Gaussian process parameterization that defines a non-linear structure
connecting latent variables, unlike common formulations of Gaussian process
latent variable models. The sparse parameterization is given a full Bayesian
treatment without compromising Markov chain Monte Carlo efficiency. We compare
the stability of the sampling procedure and the predictive ability of the model
against the current practice.Comment: 12 pages, 6 figure
Sparsity-Promoting Bayesian Dynamic Linear Models
Sparsity-promoting priors have become increasingly popular over recent years
due to an increased number of regression and classification applications
involving a large number of predictors. In time series applications where
observations are collected over time, it is often unrealistic to assume that
the underlying sparsity pattern is fixed. We propose here an original class of
flexible Bayesian linear models for dynamic sparsity modelling. The proposed
class of models expands upon the existing Bayesian literature on sparse
regression using generalized multivariate hyperbolic distributions. The
properties of the models are explored through both analytic results and
simulation studies. We demonstrate the model on a financial application where
it is shown that it accurately represents the patterns seen in the analysis of
stock and derivative data, and is able to detect major events by filtering an
artificial portfolio of assets
Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data
Coupling a multi-capillary column (MCC) with an ion mobility (IM)
spectrometer (IMS) opened a multitude of new application areas for gas
analysis, especially in a medical context, as volatile organic compounds (VOCs)
in exhaled breath can hint at a person's state of health. To obtain a potential
diagnosis from a raw MCC/IMS measurement, several computational steps are
necessary, which so far have required manual interaction, e.g., human
evaluation of discovered peaks. We have recently proposed an automated pipeline
for this task that does not require human intervention during the analysis.
Nevertheless, there is a need for improved methods for each computational step.
In comparison to gas chromatography / mass spectrometry (GC/MS) data, MCC/IMS
data is easier and less expensive to obtain, but peaks are more diffuse and
there is a higher noise level. MCC/IMS measurements can be described as samples
of mixture models (i.e., of convex combinations) of two-dimensional probability
distributions. So we use the expectation-maximization (EM) algorithm to
deconvolute mixtures in order to develop methods that improve data processing
in three computational steps: denoising, baseline correction and peak
clustering. A common theme of these methods is that mixture components within
one model are not homogeneous (e.g., all Gaussian), but of different types.
Evaluation shows that the novel methods outperform the existing ones. We
provide Python software implementing all three methods and make our evaluation
data available at http://www.rahmannlab.de/research/ims
Prototype selection for parameter estimation in complex models
Parameter estimation in astrophysics often requires the use of complex
physical models. In this paper we study the problem of estimating the
parameters that describe star formation history (SFH) in galaxies. Here,
high-dimensional spectral data from galaxies are appropriately modeled as
linear combinations of physical components, called simple stellar populations
(SSPs), plus some nonlinear distortions. Theoretical data for each SSP is
produced for a fixed parameter vector via computer modeling. Though the
parameters that define each SSP are continuous, optimizing the signal model
over a large set of SSPs on a fine parameter grid is computationally infeasible
and inefficient. The goal of this study is to estimate the set of parameters
that describes the SFH of each galaxy. These target parameters, such as the
average ages and chemical compositions of the galaxy's stellar populations, are
derived from the SSP parameters and the component weights in the signal model.
Here, we introduce a principled approach of choosing a small basis of SSP
prototypes for SFH parameter estimation. The basic idea is to quantize the
vector space and effective support of the model components. In addition to
greater computational efficiency, we achieve better estimates of the SFH target
parameters. In simulations, our proposed quantization method obtains a
substantial improvement in estimating the target parameters over the common
method of employing a parameter grid. Sparse coding techniques are not
appropriate for this problem without proper constraints, while constrained
sparse coding methods perform poorly for parameter estimation because their
objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …