11,914 research outputs found

    Gaussian Process Structural Equation Models with Latent Variables

    Full text link
    In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. The sparse parameterization is given a full Bayesian treatment without compromising Markov chain Monte Carlo efficiency. We compare the stability of the sampling procedure and the predictive ability of the model against the current practice.Comment: 12 pages, 6 figure

    Sparsity-Promoting Bayesian Dynamic Linear Models

    Get PDF
    Sparsity-promoting priors have become increasingly popular over recent years due to an increased number of regression and classification applications involving a large number of predictors. In time series applications where observations are collected over time, it is often unrealistic to assume that the underlying sparsity pattern is fixed. We propose here an original class of flexible Bayesian linear models for dynamic sparsity modelling. The proposed class of models expands upon the existing Bayesian literature on sparse regression using generalized multivariate hyperbolic distributions. The properties of the models are explored through both analytic results and simulation studies. We demonstrate the model on a financial application where it is shown that it accurately represents the patterns seen in the analysis of stock and derivative data, and is able to detect major events by filtering an artificial portfolio of assets

    Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data

    Full text link
    Coupling a multi-capillary column (MCC) with an ion mobility (IM) spectrometer (IMS) opened a multitude of new application areas for gas analysis, especially in a medical context, as volatile organic compounds (VOCs) in exhaled breath can hint at a person's state of health. To obtain a potential diagnosis from a raw MCC/IMS measurement, several computational steps are necessary, which so far have required manual interaction, e.g., human evaluation of discovered peaks. We have recently proposed an automated pipeline for this task that does not require human intervention during the analysis. Nevertheless, there is a need for improved methods for each computational step. In comparison to gas chromatography / mass spectrometry (GC/MS) data, MCC/IMS data is easier and less expensive to obtain, but peaks are more diffuse and there is a higher noise level. MCC/IMS measurements can be described as samples of mixture models (i.e., of convex combinations) of two-dimensional probability distributions. So we use the expectation-maximization (EM) algorithm to deconvolute mixtures in order to develop methods that improve data processing in three computational steps: denoising, baseline correction and peak clustering. A common theme of these methods is that mixture components within one model are not homogeneous (e.g., all Gaussian), but of different types. Evaluation shows that the novel methods outperform the existing ones. We provide Python software implementing all three methods and make our evaluation data available at http://www.rahmannlab.de/research/ims

    Prototype selection for parameter estimation in complex models

    Full text link
    Parameter estimation in astrophysics often requires the use of complex physical models. In this paper we study the problem of estimating the parameters that describe star formation history (SFH) in galaxies. Here, high-dimensional spectral data from galaxies are appropriately modeled as linear combinations of physical components, called simple stellar populations (SSPs), plus some nonlinear distortions. Theoretical data for each SSP is produced for a fixed parameter vector via computer modeling. Though the parameters that define each SSP are continuous, optimizing the signal model over a large set of SSPs on a fine parameter grid is computationally infeasible and inefficient. The goal of this study is to estimate the set of parameters that describes the SFH of each galaxy. These target parameters, such as the average ages and chemical compositions of the galaxy's stellar populations, are derived from the SSP parameters and the component weights in the signal model. Here, we introduce a principled approach of choosing a small basis of SSP prototypes for SFH parameter estimation. The basic idea is to quantize the vector space and effective support of the model components. In addition to greater computational efficiency, we achieve better estimates of the SFH target parameters. In simulations, our proposed quantization method obtains a substantial improvement in estimating the target parameters over the common method of employing a parameter grid. Sparse coding techniques are not appropriate for this problem without proper constraints, while constrained sparse coding methods perform poorly for parameter estimation because their objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore