3,062 research outputs found
Summarizing Posterior Distributions in Signal Decomposition Problems when the Number of Components is Unknown
International audienceThis paper addresses the problem of summarizing the posterior distributions that typically arise, in a Bayesian framework, when dealing with signal decomposition problems with unknown number of components. Such posterior distributions are defined over union of subspaces of differing dimensionality and can be sampled from using modern Monte Carlo techniques, for instance the increasingly popular RJ-MCMC method. No generic approach is available, however, to summarize the resulting variable-dimensional samples and extract from them component-specific parameters. We propose a novel approach to this problem, which consists in approximating the complex posterior of interest by a "simple"--but still variable-dimensional--parametric distribution. The distance between the two distributions is measured using the Kullback- Leibler divergence, and a Stochastic EM-type algorithm, driven by the RJ-MCMC sampler, is proposed to estimate the parameters. The proposed algorithm is illustrated on the fundamental signal processing example of joint detection and estimation of sinusoids in white Gaussian noise
Relabeling and Summarizing Posterior Distributions in Signal Decomposition Problems when the Number of Components is Unknown
International audienceThis paper addresses the problems of relabeling and summarizing posterior distributions that typically arise, in a Bayesian framework, when dealing with signal decomposition problems with an unknown number of components. Such posterior distributions are defined over union of subspaces of differing dimensionality and can be sampled from using modern Monte Carlo techniques, for instance the increasingly popular RJ-MCMC method. No generic approach is available, however, to summarize the resulting variable-dimensional samples and extract from them component-specific parameters. We propose a novel approach, named Variable-dimensional Approximate Posterior for Relabeling and Summarizing (VAPoRS), to this problem, which consists in approximating the posterior distribution of interest by a "simple"---but still variable-dimensional---parametric distribution. The distance between the two distributions is measured using the Kullback-Leibler divergence, and a Stochastic EM-type algorithm, driven by the RJ-MCMC sampler, is proposed to estimate the parameters. Two signal decomposition problems are considered, to show the capability of VAPoRS both for relabeling and for summarizing variable dimensional posterior distributions: the classical problem of detecting and estimating sinusoids in white Gaussian noise on the one hand, and a particle counting problem motivated by the Pierre Auger project in astrophysics on the other hand
Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study
Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced
Robust Linear Spectral Unmixing using Anomaly Detection
This paper presents a Bayesian algorithm for linear spectral unmixing of
hyperspectral images that accounts for anomalies present in the data. The model
proposed assumes that the pixel reflectances are linear mixtures of unknown
endmembers, corrupted by an additional nonlinear term modelling anomalies and
additive Gaussian noise. A Markov random field is used for anomaly detection
based on the spatial and spectral structures of the anomalies. This allows
outliers to be identified in particular regions and wavelengths of the data
cube. A Bayesian algorithm is proposed to estimate the parameters involved in
the model yielding a joint linear unmixing and anomaly detection algorithm.
Simulations conducted with synthetic and real hyperspectral images demonstrate
the accuracy of the proposed unmixing and outlier detection strategy for the
analysis of hyperspectral images
Bayesian inference for inverse problems
Traditionally, the MaxEnt workshops start by a tutorial day. This paper
summarizes my talk during 2001'th workshop at John Hopkins University. The main
idea in this talk is to show how the Bayesian inference can naturally give us
all the necessary tools we need to solve real inverse problems: starting by
simple inversion where we assume to know exactly the forward model and all the
input model parameters up to more realistic advanced problems of myopic or
blind inversion where we may be uncertain about the forward model and we may
have noisy data. Starting by an introduction to inverse problems through a few
examples and explaining their ill posedness nature, I briefly presented the
main classical deterministic methods such as data matching and classical
regularization methods to show their limitations. I then presented the main
classical probabilistic methods based on likelihood, information theory and
maximum entropy and the Bayesian inference framework for such problems. I show
that the Bayesian framework, not only generalizes all these methods, but also
gives us natural tools, for example, for inferring the uncertainty of the
computed solutions, for the estimation of the hyperparameters or for handling
myopic or blind inversion problems. Finally, through a deconvolution problem
example, I presented a few state of the art methods based on Bayesian inference
particularly designed for some of the mass spectrometry data processing
problems.Comment: Presented at MaxEnt01. To appear in Bayesian Inference and Maximum
Entropy Methods, B. Fry (Ed.), AIP Proceedings. 20pages, 13 Postscript
figure
The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions
Discovering interaction effects on a response of interest is a fundamental
problem faced in biology, medicine, economics, and many other scientific
disciplines. In theory, Bayesian methods for discovering pairwise interactions
enjoy many benefits such as coherent uncertainty quantification, the ability to
incorporate background knowledge, and desirable shrinkage properties. In
practice, however, Bayesian methods are often computationally intractable for
even moderate-dimensional problems. Our key insight is that many hierarchical
models of practical interest admit a particular Gaussian process (GP)
representation; the GP allows us to capture the posterior with a vector of O(p)
kernel hyper-parameters rather than O(p^2) interactions and main effects. With
the implicit representation, we can run Markov chain Monte Carlo (MCMC) over
model hyper-parameters in time and memory linear in p per iteration. We focus
on sparsity-inducing models and show on datasets with a variety of covariate
behaviors that our method: (1) reduces runtime by orders of magnitude over
naive applications of MCMC, (2) provides lower Type I and Type II error
relative to state-of-the-art LASSO-based approaches, and (3) offers improved
computational scaling in high dimensions relative to existing Bayesian and
LASSO-based approaches.Comment: Accepted at ICML 2019. 20 pages, 4 figures, 3 table
A Hierarchical Spatio-Temporal Statistical Model Motivated by Glaciology
In this paper, we extend and analyze a Bayesian hierarchical spatio-temporal
model for physical systems. A novelty is to model the discrepancy between the
output of a computer simulator for a physical process and the actual process
values with a multivariate random walk. For computational efficiency, linear
algebra for bandwidth limited matrices is utilized, and first-order emulator
inference allows for the fast emulation of a numerical partial differential
equation (PDE) solver. A test scenario from a physical system motivated by
glaciology is used to examine the speed and accuracy of the computational
methods used, in addition to the viability of modeling assumptions. We conclude
by discussing how the model and associated methodology can be applied in other
physical contexts besides glaciology.Comment: Revision accepted for publication by the Journal of Agricultural,
Biological, and Environmental Statistic
Adaptive MCMC with online relabeling
When targeting a distribution that is artificially invariant under some
permutations, Markov chain Monte Carlo (MCMC) algorithms face the
label-switching problem, rendering marginal inference particularly cumbersome.
Such a situation arises, for example, in the Bayesian analysis of finite
mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM),
which self-calibrates its proposal distribution using an online estimate of the
covariance matrix of the target, are no exception. To address the
label-switching issue, relabeling algorithms associate a permutation to each
MCMC sample, trying to obtain reasonable marginals. In the case of adaptive
Metropolis (Bernoulli 7 (2001) 223-242), an online relabeling strategy is
required. This paper is devoted to the AMOR algorithm, a provably consistent
variant of AM that can cope with the label-switching problem. The idea is to
nest relabeling steps within the MCMC algorithm based on the estimation of a
single covariance matrix that is used both for adapting the covariance of the
proposal distribution in the Metropolis algorithm step and for online
relabeling. We compare the behavior of AMOR to similar relabeling methods. In
the case of compactly supported target distributions, we prove a strong law of
large numbers for AMOR and its ergodicity. These are the first results on the
consistency of an online relabeling algorithm to our knowledge. The proof
underlines latent relations between relabeling and vector quantization.Comment: Published at http://dx.doi.org/10.3150/13-BEJ578 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
Advances in Waveform and Photon Counting Lidar Processing for Forest Vegetation Applications
Full waveform (FW) and photon counting LiDAR (PCL) data have garnered greater attention due to increasing data availability, a wealth of information they contain and promising prospects for large scale vegetation mapping. However, many factors such as complex processing steps and scarce non-proprietary tools preclude extensive and practical uses of these data for vegetation characterization. Therefore, the overall goal of this study is to develop algorithms to process FW and PCL data and to explore their potential in real-world applications.
Study I explored classical waveform decomposition methods such as the Gaussian decomposition, Richardson–Lucy (RL) deconvolution and a newly introduced optimized Gold deconvolution to process FW LiDAR data. Results demonstrated the advantages of the deconvolution and decomposition method, and the three approaches generated satisfactory results, while the best performances varied when different criteria were used.
Built upon Study I, Study II applied the Bayesian non-linear modeling concepts for waveform decomposition and quantified the propagation of error and uncertainty along the processing steps. The performance evaluation and uncertainty analysis at the parameter, derived point cloud and surface model levels showed that the Bayesian decomposition could enhance the credibility of decomposition results in a probabilistic sense to capture the true error of estimates and trace the uncertainty propagation along the processing steps.
In study III, we exploited FW LiDAR data to classify tree species through integrating machine learning methods (the Random forests (RF) and Conditional inference forests (CF)) and Bayesian inference method. Results of classification accuracy highlighted that the Bayesian method was a superior alternative to machine learning methods, and rendered users with more confidence for interpreting and applying classification results to real-world tasks such as forest inventory.
Study IV focused on developing a framework to derive terrain elevation and vegetation canopy height from test-bed sensor data and to pre-validate the capacity of the upcoming Ice, Cloud and Land Elevation Satellite-2 (ICESat-2) mission. The methodology developed in this study illustrates plausible ways of processing the data that are structurally similar to expected ICESat-2 data and holds the potential to be a benchmark for further method adjustment once genuine ICESat-2 are available
- …