9,848 research outputs found
Optimal estimation of high-order missing masses, and the rare-type match problem
Consider a random sample from an unknown discrete
distribution on a countable alphabet
, and let be the empirical frequencies of
distinct symbols 's in the sample. We consider the problem of estimating
the -order missing mass, which is a discrete functional of defined as
This is
generalization of the missing mass whose estimation is a classical problem in
statistics, being the subject of numerous studies both in theory and methods.
First, we introduce a nonparametric estimator of
and a corresponding non-asymptotic confidence interval through concentration
properties of . Then, we investigate minimax
estimation of , which is the main contribution of
our work. We show that minimax estimation is not feasible over the class of all
discrete distributions on , and not even for distributions with
regularly varying tails, which only guarantee that our estimator is consistent
for . This leads to introduce the stronger
assumption of second-order regular variation for the tail behaviour of ,
which is proved to be sufficient for minimax estimation of
, making the proposed estimator an optimal minimax
estimator of . Our interest in the -order
missing mass arises from forensic statistics, where the estimation of the
-order missing mass appears in connection to the estimation of the
likelihood ratio
,
known as the "fundamental problem of forensic mathematics". We present
theoretical guarantees to nonparametric estimation of
Minimax Estimation of Kernel Mean Embeddings
In this paper, we study the minimax estimation of the Bochner integral
also called as the kernel
mean embedding, based on random samples drawn i.i.d.~from , where
is a positive definite
kernel. Various estimators (including the empirical estimator),
of are studied in the literature wherein all of
them satisfy with
being the reproducing kernel Hilbert space induced by . The
main contribution of the paper is in showing that the above mentioned rate of
is minimax in and
-norms over the class of discrete measures and
the class of measures that has an infinitely differentiable density, with
being a continuous translation-invariant kernel on . The
interesting aspect of this result is that the minimax rate is independent of
the smoothness of the kernel and the density of (if it exists). This result
has practical consequences in statistical applications as the mean embedding
has been widely employed in non-parametric hypothesis testing, density
estimation, causal inference and feature selection, through its relation to
energy distance (and distance covariance)
- β¦