760 research outputs found
Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
Modern applications increasingly involve high-dimensional and heterogeneous
data, e.g., datasets formed by combining numerous measurements from myriad
sources. Principal Component Analysis (PCA) is a classical method for reducing
dimensionality by projecting such data onto a low-dimensional subspace
capturing most of their variation, but PCA does not robustly recover underlying
subspaces in the presence of heteroscedastic noise. Specifically, PCA suffers
from treating all data samples as if they are equally informative. This paper
analyzes a weighted variant of PCA that accounts for heteroscedasticity by
giving samples with larger noise variance less influence. The analysis provides
expressions for the asymptotic recovery of underlying low-dimensional
components from samples with heteroscedastic noise in the high-dimensional
regime, i.e., for sample dimension on the order of the number of samples.
Surprisingly, it turns out that whitening the noise by using inverse noise
variance weights is suboptimal. We derive optimal weights, characterize the
performance of weighted PCA, and consider the problem of optimally collecting
samples under budget constraints.Comment: 52 pages, 13 figure
A Bayesian Heteroscedastic GLM with Application to fMRI Data with Motion Spikes
We propose a voxel-wise general linear model with autoregressive noise and
heteroscedastic noise innovations (GLMH) for analyzing functional magnetic
resonance imaging (fMRI) data. The model is analyzed from a Bayesian
perspective and has the benefit of automatically down-weighting time points
close to motion spikes in a data-driven manner. We develop a highly efficient
Markov Chain Monte Carlo (MCMC) algorithm that allows for Bayesian variable
selection among the regressors to model both the mean (i.e., the design matrix)
and variance. This makes it possible to include a broad range of explanatory
variables in both the mean and variance (e.g., time trends, activation stimuli,
head motion parameters and their temporal derivatives), and to compute the
posterior probability of inclusion from the MCMC output. Variable selection is
also applied to the lags in the autoregressive noise process, making it
possible to infer the lag order from the data simultaneously with all other
model parameters. We use both simulated data and real fMRI data from OpenfMRI
to illustrate the importance of proper modeling of heteroscedasticity in fMRI
data analysis. Our results show that the GLMH tends to detect more brain
activity, compared to its homoscedastic counterpart, by allowing the variance
to change over time depending on the degree of head motion
Uncertainty in multitask learning: joint representations for probabilistic MR-only radiotherapy planning
Multi-task neural network architectures provide a mechanism that jointly
integrates information from distinct sources. It is ideal in the context of
MR-only radiotherapy planning as it can jointly regress a synthetic CT (synCT)
scan and segment organs-at-risk (OAR) from MRI. We propose a probabilistic
multi-task network that estimates: 1) intrinsic uncertainty through a
heteroscedastic noise model for spatially-adaptive task loss weighting and 2)
parameter uncertainty through approximate Bayesian inference. This allows
sampling of multiple segmentations and synCTs that share their network
representation. We test our model on prostate cancer scans and show that it
produces more accurate and consistent synCTs with a better estimation in the
variance of the errors, state of the art results in OAR segmentation and a
methodology for quality assurance in radiotherapy treatment planning.Comment: Early-accept at MICCAI 2018, 8 pages, 4 figure
Large-scale Heteroscedastic Regression via Gaussian Process
Heteroscedastic regression considering the varying noises among observations
has many applications in the fields like machine learning and statistics. Here
we focus on the heteroscedastic Gaussian process (HGP) regression which
integrates the latent function and the noise function together in a unified
non-parametric Bayesian framework. Though showing remarkable performance, HGP
suffers from the cubic time complexity, which strictly limits its application
to big data. To improve the scalability, we first develop a variational sparse
inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore,
two variants are developed to improve the scalability and capability of VSHGP.
The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence
lower bound, thus enhancing efficient stochastic variational inference. The
second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee
machine formalism to distribute computations over multiple local VSHGP experts
with many inducing points; and (ii) adopts hybrid parameters for experts to
guard against over-fitting and capture local variety. The superiority of DVSHGP
and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs
is then extensively verified on various datasets.Comment: 14 pages, 15 figure
HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with Heteroscedastic Noise
Mixtures of probabilistic principal component analysis (MPPCA) is a
well-known mixture model extension of principal component analysis (PCA).
Similar to PCA, MPPCA assumes the data samples in each mixture contain
homoscedastic noise. However, datasets with heterogeneous noise across samples
are becoming increasingly common, as larger datasets are generated by
collecting samples from several sources with varying noise profiles. The
performance of MPPCA is suboptimal for data with heteroscedastic noise across
samples. This paper proposes a heteroscedastic mixtures of probabilistic PCA
technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM)
algorithm to jointly estimate the unknown underlying factors, means, and noise
variances under a heteroscedastic noise setting. Simulation results illustrate
the improved factor estimates and clustering accuracies of HeMPPCAT compared to
MPPCA
- …