5,304 research outputs found

    Wasserstein convergence in Bayesian and frequentist deconvolution models

    Get PDF
    We study the multivariate deconvolution problem of recovering the distribution of a signal from independent and identically distributed observations additively contaminated with random errors (noise) from a known distribution. For errors with independent coordinates having ordinary smooth densities, we derive an inversion inequality relating the L1-Wasserstein distance between two distributions of the signal to the L1-distance between the corresponding mixture densities of the observations. This smoothing inequality outperforms existing inversion inequalities. As an application of the inversion inequality to the Bayesian framework, we consider 1-Wasserstein deconvolution with Laplace noise in dimension one using a Dirichlet process mixture of normal densities as a prior measure on the mixing distribution (or distribution of the signal). We construct an adaptive approximation of the sampling density by convolving the Laplace density with a well-chosen mixture of normal densities and show that the posterior measure concentrates around the sampling density at a nearly minimax rate, up to a log-factor, in the L1-distance. The same posterior law is also shown to automatically adapt to the unknown Sobolev regularity of the mixing density, thus leading to a new Bayesian adaptive estimation procedure for mixing distributions with regular densities under the L1-Wasserstein metric. We illustrate utility of the inversion inequality also in a frequentist setting by showing that an appropriate isotone approximation of the classical kernel deconvolution estimator attains the minimax rate of convergence for 1-Wasserstein deconvolution in any dimension d≥1, when only a tail condition is required on the latent mixing density and we derive sharp lower bounds for these problems

    Minimax Estimation of Kernel Mean Embeddings

    Full text link
    In this paper, we study the minimax estimation of the Bochner integral μk(P):=∫Xk(⋅,x) dP(x),\mu_k(P):=\int_{\mathcal{X}} k(\cdot,x)\,dP(x), also called as the kernel mean embedding, based on random samples drawn i.i.d.~from PP, where k:X×X→Rk:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R} is a positive definite kernel. Various estimators (including the empirical estimator), θ^n\hat{\theta}_n of μk(P)\mu_k(P) are studied in the literature wherein all of them satisfy ∥θ^n−μk(P)∥Hk=OP(n−1/2)\bigl\| \hat{\theta}_n-\mu_k(P)\bigr\|_{\mathcal{H}_k}=O_P(n^{-1/2}) with Hk\mathcal{H}_k being the reproducing kernel Hilbert space induced by kk. The main contribution of the paper is in showing that the above mentioned rate of n−1/2n^{-1/2} is minimax in ∥⋅∥Hk\|\cdot\|_{\mathcal{H}_k} and ∥⋅∥L2(Rd)\|\cdot\|_{L^2(\mathbb{R}^d)}-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with kk being a continuous translation-invariant kernel on Rd\mathbb{R}^d. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of PP (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance)

    Bayesian adaptation

    Full text link
    In the need for low assumption inferential methods in infinite-dimensional settings, Bayesian adaptive estimation via a prior distribution that does not depend on the regularity of the function to be estimated nor on the sample size is valuable. We elucidate relationships among the main approaches followed to design priors for minimax-optimal rate-adaptive estimation meanwhile shedding light on the underlying ideas.Comment: 20 pages, Propositions 3 and 5 adde

    Optimal graphon estimation in cut distance

    Full text link
    Consider the twin problems of estimating the connection probability matrix of an inhomogeneous random graph and the graphon of a W-random graph. We establish the minimax estimation rates with respect to the cut metric for classes of block constant matrices and step function graphons. Surprisingly, our results imply that, from the minimax point of view, the raw data, that is, the adjacency matrix of the observed graph, is already optimal and more involved procedures cannot improve the convergence rates for this metric. This phenomenon contrasts with optimal rates of convergence with respect to other classical distances for graphons such as the l 1 or l 2 metrics

    A Note on Minimax Testing and Confidence Intervals in Moment Inequality Models

    Get PDF
    This note uses a simple example to show how moment inequality models used in the empirical economics literature lead to general minimax relative efficiency comparisons. The main point is that such models involve inference on a low dimensional parameter, which leads naturally to a definition of "distance" that, in full generality, would be arbitrary in minimax testing problems. This definition of distance is justified by the fact that it leads to a duality between minimaxity of confidence intervals and tests, which does not hold for other definitions of distance. Thus, the use of moment inequalities for inference in a low dimensional parametric model places additional structure on the testing problem, which leads to stronger conclusions regarding minimax relative efficiency than would otherwise be possible
    • …
    corecore