172 research outputs found

    PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

    Get PDF
    The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

    Sparsity and Incoherence in Compressive Sampling

    Get PDF
    We consider the problem of reconstructing a sparse signal x0∈Rnx^0\in\R^n from a limited number of linear measurements. Given mm randomly selected samples of Ux0U x^0, where UU is an orthonormal matrix, we show that ℓ1\ell_1 minimization recovers x0x^0 exactly when the number of measurements exceeds m≄Constâ‹…ÎŒ2(U)⋅S⋅log⁥n, m\geq \mathrm{Const}\cdot\mu^2(U)\cdot S\cdot\log n, where SS is the number of nonzero components in x0x^0, and ÎŒ\mu is the largest entry in UU properly normalized: ÎŒ(U)=n⋅max⁥k,j∣Uk,j∣\mu(U) = \sqrt{n} \cdot \max_{k,j} |U_{k,j}|. The smaller ÎŒ\mu, the fewer samples needed. The result holds for ``most'' sparse signals x0x^0 supported on a fixed (but arbitrary) set TT. Given TT, if the sign of x0x^0 for each nonzero entry on TT and the observed values of Ux0Ux^0 are drawn at random, the signal is recovered with overwhelming probability. Moreover, there is a sense in which this is nearly optimal since any method succeeding with the same probability would require just about this many samples

    A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models

    Get PDF
    This paper addresses the problem of Monte Carlo approximation of posterior probability distributions. In particular, we have considered a recently proposed technique known as population Monte Carlo (PMC), which is based on an iterative importance sampling approach. An important drawback of this methodology is the degeneracy of the importance weights when the dimension of either the observations or the variables of interest is high. To alleviate this difficulty, we propose a novel method that performs a nonlinear transformation on the importance weights. This operation reduces the weight variation, hence it avoids their degeneracy and increases the efficiency of the importance sampling scheme, specially when drawing from a proposal functions which are poorly adapted to the true posterior. For the sake of illustration, we have applied the proposed algorithm to the estimation of the parameters of a Gaussian mixture model. This is a very simple problem that enables us to clearly show and discuss the main features of the proposed technique. As a practical application, we have also considered the popular (and challenging) problem of estimating the rate parameters of stochastic kinetic models (SKM). SKMs are highly multivariate systems that model molecular interactions in biological and chemical problems. We introduce a particularization of the proposed algorithm to SKMs and present numerical results.Comment: 35 pages, 8 figure

    Restricted Isometries for Partial Random Circulant Matrices

    Get PDF
    In the theory of compressed sensing, restricted isometry analysis has become a standard tool for studying how efficiently a measurement matrix acquires information about sparse and compressible signals. Many recovery algorithms are known to succeed when the restricted isometry constants of the sampling matrix are small. Many potential applications of compressed sensing involve a data-acquisition process that proceeds by convolution with a random pulse followed by (nonrandom) subsampling. At present, the theoretical analysis of this measurement technique is lacking. This paper demonstrates that the ssth order restricted isometry constant is small when the number mm of samples satisfies m≳(slog⁡n)3/2m \gtrsim (s \log n)^{3/2}, where nn is the length of the pulse. This bound improves on previous estimates, which exhibit quadratic scaling

    Neural-based Compression Scheme for Solar Image Data

    Full text link
    Studying the solar system and especially the Sun relies on the data gathered daily from space missions. These missions are data-intensive and compressing this data to make them efficiently transferable to the ground station is a twofold decision to make. Stronger compression methods, by distorting the data, can increase data throughput at the cost of accuracy which could affect scientific analysis of the data. On the other hand, preserving subtle details in the compressed data requires a high amount of data to be transferred, reducing the desired gains from compression. In this work, we propose a neural network-based lossy compression method to be used in NASA's data-intensive imagery missions. We chose NASA's SDO mission which transmits 1.4 terabytes of data each day as a proof of concept for the proposed algorithm. In this work, we propose an adversarially trained neural network, equipped with local and non-local attention modules to capture both the local and global structure of the image resulting in a better trade-off in rate-distortion (RD) compared to conventional hand-engineered codecs. The RD variational autoencoder used in this work is jointly trained with a channel-dependent entropy model as a shared prior between the analysis and synthesis transforms to make the entropy coding of the latent code more effective. Our neural image compression algorithm outperforms currently-in-use and state-of-the-art codecs such as JPEG and JPEG-2000 in terms of the RD performance when compressing extreme-ultraviolet (EUV) data. As a proof of concept for use of this algorithm in SDO data analysis, we have performed coronal hole (CH) detection using our compressed images, and generated consistent segmentations, even at a compression rate of ∌0.1\sim0.1 bits per pixel (compared to 8 bits per pixel on the original data) using EUV data from SDO.Comment: Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems (TAES). arXiv admin note: text overlap with arXiv:2210.0647

    Towards Machine Wald

    Get PDF
    The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page
    • 

    corecore