5 research outputs found

    Parametric estimation and tests through divergences and duality technique

    Get PDF
    We introduce estimation and test procedures through divergence optimization for discrete or continuous parametric models. This approach is based on a new dual representation for divergences. We treat point estimation and tests for simple and composite hypotheses, extending maximum likelihood technique. An other view at the maximum likelihood approach, for estimation and test, is given. We prove existence and consistency of the proposed estimates. The limit laws of the estimates and test statistics (including the generalized likelihood ratio one) are given both under the null and the alternative hypotheses, and approximation of the power functions is deduced. A new procedure of construction of confidence regions, when the parameter may be a boundary value of the parameter space, is proposed. Also, a solution to the irregularity problem of the generalized likelihood ratio test pertaining to the number of components in a mixture is given, and a new test is proposed, based on χ2\chi ^{2}-divergence on signed finite measures and duality technique

    Likelihood-free hypothesis testing

    Full text link
    Consider the problem of testing ZPmZ \sim \mathbb P^{\otimes m} vs ZQmZ \sim \mathbb Q^{\otimes m} from mm samples. Generally, to achieve a small error rate it is necessary and sufficient to have m1/ϵ2m \asymp 1/\epsilon^2, where ϵ\epsilon measures the separation between P\mathbb P and Q\mathbb Q in total variation (TV\mathsf{TV}). Achieving this, however, requires complete knowledge of the distributions P\mathbb P and Q\mathbb Q and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to P\mathbb P and Q\mathbb Q (which are a priori only known to belong to a large non-parametric family P\mathcal P) is given through nn iid samples from each. We demostrate existence of a fundamental trade-off between nn and mm given by nmnGoF2(ϵ,P)nm \asymp n^2_\mathsf{GoF}(\epsilon,\mathcal P), where nGoFn_\mathsf{GoF} is the minimax sample complexity of testing between the hypotheses H0:P=QH_0: \mathbb P= \mathbb Q vs H1:TV(P,Q)ϵH_1: \mathsf{TV}(\mathbb P,\mathbb Q) \ge \epsilon. We show this for three non-parametric families P\cal P: β\beta-smooth densities over [0,1]d[0,1]^d, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions P\mathcal P on a large alphabet [k][k] with pmfs bounded by c/kc/k for fixed cc. The test that we propose (based on the L2L^2-distance statistic of Ingster) simultaneously achieves all points on the tradeoff curve for these families. In particular, when m1/ϵ2m\gg 1/\epsilon^2 our test requires the number of simulation samples nn to be orders of magnitude smaller than what is needed for density estimation with accuracy ϵ\asymp \epsilon (under TV\mathsf{TV}). This demonstrates the possibility of testing without fully estimating the distributions.Comment: 48 pages, 1 figur
    corecore