3,062 research outputs found

    Summarizing Posterior Distributions in Signal Decomposition Problems when the Number of Components is Unknown

    No full text
    International audienceThis paper addresses the problem of summarizing the posterior distributions that typically arise, in a Bayesian framework, when dealing with signal decomposition problems with unknown number of components. Such posterior distributions are defined over union of subspaces of differing dimensionality and can be sampled from using modern Monte Carlo techniques, for instance the increasingly popular RJ-MCMC method. No generic approach is available, however, to summarize the resulting variable-dimensional samples and extract from them component-specific parameters. We propose a novel approach to this problem, which consists in approximating the complex posterior of interest by a "simple"--but still variable-dimensional--parametric distribution. The distance between the two distributions is measured using the Kullback- Leibler divergence, and a Stochastic EM-type algorithm, driven by the RJ-MCMC sampler, is proposed to estimate the parameters. The proposed algorithm is illustrated on the fundamental signal processing example of joint detection and estimation of sinusoids in white Gaussian noise

    Relabeling and Summarizing Posterior Distributions in Signal Decomposition Problems when the Number of Components is Unknown

    Get PDF
    International audienceThis paper addresses the problems of relabeling and summarizing posterior distributions that typically arise, in a Bayesian framework, when dealing with signal decomposition problems with an unknown number of components. Such posterior distributions are defined over union of subspaces of differing dimensionality and can be sampled from using modern Monte Carlo techniques, for instance the increasingly popular RJ-MCMC method. No generic approach is available, however, to summarize the resulting variable-dimensional samples and extract from them component-specific parameters. We propose a novel approach, named Variable-dimensional Approximate Posterior for Relabeling and Summarizing (VAPoRS), to this problem, which consists in approximating the posterior distribution of interest by a "simple"---but still variable-dimensional---parametric distribution. The distance between the two distributions is measured using the Kullback-Leibler divergence, and a Stochastic EM-type algorithm, driven by the RJ-MCMC sampler, is proposed to estimate the parameters. Two signal decomposition problems are considered, to show the capability of VAPoRS both for relabeling and for summarizing variable dimensional posterior distributions: the classical problem of detecting and estimating sinusoids in white Gaussian noise on the one hand, and a particle counting problem motivated by the Pierre Auger project in astrophysics on the other hand

    Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study

    Get PDF
    Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced

    Robust Linear Spectral Unmixing using Anomaly Detection

    Full text link
    This paper presents a Bayesian algorithm for linear spectral unmixing of hyperspectral images that accounts for anomalies present in the data. The model proposed assumes that the pixel reflectances are linear mixtures of unknown endmembers, corrupted by an additional nonlinear term modelling anomalies and additive Gaussian noise. A Markov random field is used for anomaly detection based on the spatial and spectral structures of the anomalies. This allows outliers to be identified in particular regions and wavelengths of the data cube. A Bayesian algorithm is proposed to estimate the parameters involved in the model yielding a joint linear unmixing and anomaly detection algorithm. Simulations conducted with synthetic and real hyperspectral images demonstrate the accuracy of the proposed unmixing and outlier detection strategy for the analysis of hyperspectral images

    Bayesian inference for inverse problems

    Get PDF
    Traditionally, the MaxEnt workshops start by a tutorial day. This paper summarizes my talk during 2001'th workshop at John Hopkins University. The main idea in this talk is to show how the Bayesian inference can naturally give us all the necessary tools we need to solve real inverse problems: starting by simple inversion where we assume to know exactly the forward model and all the input model parameters up to more realistic advanced problems of myopic or blind inversion where we may be uncertain about the forward model and we may have noisy data. Starting by an introduction to inverse problems through a few examples and explaining their ill posedness nature, I briefly presented the main classical deterministic methods such as data matching and classical regularization methods to show their limitations. I then presented the main classical probabilistic methods based on likelihood, information theory and maximum entropy and the Bayesian inference framework for such problems. I show that the Bayesian framework, not only generalizes all these methods, but also gives us natural tools, for example, for inferring the uncertainty of the computed solutions, for the estimation of the hyperparameters or for handling myopic or blind inversion problems. Finally, through a deconvolution problem example, I presented a few state of the art methods based on Bayesian inference particularly designed for some of the mass spectrometry data processing problems.Comment: Presented at MaxEnt01. To appear in Bayesian Inference and Maximum Entropy Methods, B. Fry (Ed.), AIP Proceedings. 20pages, 13 Postscript figure

    The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions

    Full text link
    Discovering interaction effects on a response of interest is a fundamental problem faced in biology, medicine, economics, and many other scientific disciplines. In theory, Bayesian methods for discovering pairwise interactions enjoy many benefits such as coherent uncertainty quantification, the ability to incorporate background knowledge, and desirable shrinkage properties. In practice, however, Bayesian methods are often computationally intractable for even moderate-dimensional problems. Our key insight is that many hierarchical models of practical interest admit a particular Gaussian process (GP) representation; the GP allows us to capture the posterior with a vector of O(p) kernel hyper-parameters rather than O(p^2) interactions and main effects. With the implicit representation, we can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in time and memory linear in p per iteration. We focus on sparsity-inducing models and show on datasets with a variety of covariate behaviors that our method: (1) reduces runtime by orders of magnitude over naive applications of MCMC, (2) provides lower Type I and Type II error relative to state-of-the-art LASSO-based approaches, and (3) offers improved computational scaling in high dimensions relative to existing Bayesian and LASSO-based approaches.Comment: Accepted at ICML 2019. 20 pages, 4 figures, 3 table

    A Hierarchical Spatio-Temporal Statistical Model Motivated by Glaciology

    Get PDF
    In this paper, we extend and analyze a Bayesian hierarchical spatio-temporal model for physical systems. A novelty is to model the discrepancy between the output of a computer simulator for a physical process and the actual process values with a multivariate random walk. For computational efficiency, linear algebra for bandwidth limited matrices is utilized, and first-order emulator inference allows for the fast emulation of a numerical partial differential equation (PDE) solver. A test scenario from a physical system motivated by glaciology is used to examine the speed and accuracy of the computational methods used, in addition to the viability of modeling assumptions. We conclude by discussing how the model and associated methodology can be applied in other physical contexts besides glaciology.Comment: Revision accepted for publication by the Journal of Agricultural, Biological, and Environmental Statistic

    Adaptive MCMC with online relabeling

    Full text link
    When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online estimate of the covariance matrix of the target, are no exception. To address the label-switching issue, relabeling algorithms associate a permutation to each MCMC sample, trying to obtain reasonable marginals. In the case of adaptive Metropolis (Bernoulli 7 (2001) 223-242), an online relabeling strategy is required. This paper is devoted to the AMOR algorithm, a provably consistent variant of AM that can cope with the label-switching problem. The idea is to nest relabeling steps within the MCMC algorithm based on the estimation of a single covariance matrix that is used both for adapting the covariance of the proposal distribution in the Metropolis algorithm step and for online relabeling. We compare the behavior of AMOR to similar relabeling methods. In the case of compactly supported target distributions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the first results on the consistency of an online relabeling algorithm to our knowledge. The proof underlines latent relations between relabeling and vector quantization.Comment: Published at http://dx.doi.org/10.3150/13-BEJ578 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Interpretable statistics for complex modelling: quantile and topological learning

    Get PDF
    As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

    Advances in Waveform and Photon Counting Lidar Processing for Forest Vegetation Applications

    Get PDF
    Full waveform (FW) and photon counting LiDAR (PCL) data have garnered greater attention due to increasing data availability, a wealth of information they contain and promising prospects for large scale vegetation mapping. However, many factors such as complex processing steps and scarce non-proprietary tools preclude extensive and practical uses of these data for vegetation characterization. Therefore, the overall goal of this study is to develop algorithms to process FW and PCL data and to explore their potential in real-world applications. Study I explored classical waveform decomposition methods such as the Gaussian decomposition, Richardson–Lucy (RL) deconvolution and a newly introduced optimized Gold deconvolution to process FW LiDAR data. Results demonstrated the advantages of the deconvolution and decomposition method, and the three approaches generated satisfactory results, while the best performances varied when different criteria were used. Built upon Study I, Study II applied the Bayesian non-linear modeling concepts for waveform decomposition and quantified the propagation of error and uncertainty along the processing steps. The performance evaluation and uncertainty analysis at the parameter, derived point cloud and surface model levels showed that the Bayesian decomposition could enhance the credibility of decomposition results in a probabilistic sense to capture the true error of estimates and trace the uncertainty propagation along the processing steps. In study III, we exploited FW LiDAR data to classify tree species through integrating machine learning methods (the Random forests (RF) and Conditional inference forests (CF)) and Bayesian inference method. Results of classification accuracy highlighted that the Bayesian method was a superior alternative to machine learning methods, and rendered users with more confidence for interpreting and applying classification results to real-world tasks such as forest inventory. Study IV focused on developing a framework to derive terrain elevation and vegetation canopy height from test-bed sensor data and to pre-validate the capacity of the upcoming Ice, Cloud and Land Elevation Satellite-2 (ICESat-2) mission. The methodology developed in this study illustrates plausible ways of processing the data that are structurally similar to expected ICESat-2 data and holds the potential to be a benchmark for further method adjustment once genuine ICESat-2 are available
    corecore