56,065 research outputs found
Uncertainty-Aware Principal Component Analysis
We present a technique to perform dimensionality reduction on data that is
subject to uncertainty. Our method is a generalization of traditional principal
component analysis (PCA) to multivariate probability distributions. In
comparison to non-linear methods, linear dimensionality reduction techniques
have the advantage that the characteristics of such probability distributions
remain intact after projection. We derive a representation of the PCA sample
covariance matrix that respects potential uncertainty in each of the inputs,
building the mathematical foundation of our new method: uncertainty-aware PCA.
In addition to the accuracy and performance gained by our approach over
sampling-based strategies, our formulation allows us to perform sensitivity
analysis with regard to the uncertainty in the data. For this, we propose
factor traces as a novel visualization that enables to better understand the
influence of uncertainty on the chosen principal components. We provide
multiple examples of our technique using real-world datasets. As a special
case, we show how to propagate multivariate normal distributions through PCA in
closed form. Furthermore, we discuss extensions and limitations of our
approach
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
In this work we address the problem of approximating high-dimensional data
with a low-dimensional representation. We make the following contributions. We
propose an inverse regression method which exchanges the roles of input and
response, such that the low-dimensional variable becomes the regressor, and
which is tractable. We introduce a mixture of locally-linear probabilistic
mapping model that starts with estimating the parameters of inverse regression,
and follows with inferring closed-form solutions for the forward parameters of
the high-dimensional regression problem of interest. Moreover, we introduce a
partially-latent paradigm, such that the vector-valued response variable is
composed of both observed and latent entries, thus being able to deal with data
contaminated by experimental artifacts that cannot be explained with noise
models. The proposed probabilistic formulation could be viewed as a
latent-variable augmentation of regression. We devise expectation-maximization
(EM) procedures based on a data augmentation strategy which facilitates the
maximum-likelihood search over the model parameters. We propose two
augmentation schemes and we describe in detail the associated EM inference
procedures that may well be viewed as generalizations of a number of EM
regression, dimension reduction, and factor analysis algorithms. The proposed
framework is validated with both synthetic and real data. We provide
experimental evidence that our method outperforms several existing regression
techniques
Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion
It is often desirable to build a statistical emulator of a complex computer simulator in order to perform analysis which would otherwise be computationally infeasible. We propose methodology to model multivariate output from a computer simulator taking into account output structure in the responses. The utility of this approach is demonstrated by applying it to a chemical and biological hazard prediction model. Predicting the hazard area which results from an accidental or deliberate chemical or biological release is imperative in civil and military planning and also in emergency response. The hazard area resulting from such a release is highly structured in space and we therefore propose the use of a thin-plate spline to capture the spatial structure and fit a Gaussian process emulator to the coefficients of the resultant basis functions. We compare and contrast four different techniques for emulating multivariate output: dimension-reduction using (i) a fully Bayesian approach with a principal component basis, (ii) a fully Bayesian approach with a thin-plate spline basis, assuming that the basis coefficients are independent, and (iii) a “plug-in” Bayesian approach with a thin-plate spline basis and a separable covariance structure; and (iv) a functional data modeling approach using a tensor-product (separable) Gaussian process. We develop methodology for the two thin-plate spline emulators and demonstrate that these emulators significantly outperform the principal component emulator. Further, the separable thin-plate spline emulator, which accounts for the dependence between basis coefficients, provides substantially more realistic quantification of uncertainty, and is also computationally more tractable, allowing fast emulation. For high resolution output data, it also offers substantial predictive and computational ad- vantages over the tensor-product Gaussian process emulator
- …