1 research outputs found
Non-Bayesian Parametric Missing-Mass Estimation
We consider the classical problem of missing-mass estimation, which deals
with estimating the total probability of unseen elements in a sample. The
missing-mass estimation problem has various applications in machine learning,
statistics, language processing, ecology, sensor networks, and others. The
naive, constrained maximum likelihood (CML) estimator is inappropriate for this
problem since it tends to overestimate the probability of the observed
elements. Similarly, the conventional constrained Cramer-Rao bound (CCRB),
which is a lower bound on the mean-squared-error (MSE) of unbiased estimators,
does not provide a relevant bound on the performance for this problem. In this
paper, we introduce a frequentist, non-Bayesian parametric model of the problem
of missing-mass estimation. We introduce the concept of missing-mass
unbiasedness by using the Lehmann unbiasedness definition. We derive a
non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the
missing-mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The
missing-mass unbiasedness and the proposed mmCCRB can be used to evaluate the
performance of existing estimators. Based on the new mmCCRB, we propose a new
method to improve existing estimators by an iterative missing-mass Fisher
scoring method. Finally, we demonstrate via numerical simulations that the
proposed mmCCRB is a valid and informative lower bound on the mmMSE of
state-of-the-art estimators for this problem: the CML, the Good-Turing, and
Laplace estimators. We also show that the performance of the Laplace estimator
is improved by using the new Fisher-scoring method