2,417 research outputs found

    Minimax rates of entropy estimation on large alphabets via best polynomial approximation

    Full text link
    Consider the problem of estimating the Shannon entropy of a distribution over kk elements from nn independent samples. We show that the minimax mean-square error is within universal multiplicative constant factors of (knlogk)2+log2kn\Big(\frac{k }{n \log k}\Big)^2 + \frac{\log^2 k}{n} if nn exceeds a constant factor of klogk\frac{k}{\log k}; otherwise there exists no consistent estimator. This refines the recent result of Valiant-Valiant \cite{VV11} that the minimal sample size for consistent entropy estimation scales according to Θ(klogk)\Theta(\frac{k}{\log k}). The apparatus of best polynomial approximation plays a key role in both the construction of optimal estimators and, via a duality argument, the minimax lower bound

    Nonparametric density estimation by histogram trend filtering

    Full text link
    We propose a novel approach for density estimation called histogram trend filtering. Our estimator arises from looking at surrogate Poisson model for counts of observations in a partition of the support of the data. We begin by showing consistency for a variational estimator for this density estimation problem. We then study a discrete estimator that can be efficiently found via convex optimization. We show that the estimator enjoys strong statistical guarantees, yet is much more practical and computationally efficient than other estimators that enjoy similar guarantees. Finally, in our simulation study the proposed method showed smaller averaged mean square error than competing methods. This favorable blend of properties makes histogram trend filtering an ideal candidate for use in routine data-analysis applications that call for a quick, efficient, accurate density estimate

    Minimax Estimation of Functionals of Discrete Distributions

    Full text link
    We propose a general methodology for the construction and analysis of minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the alphabet size SS is unknown and may be comparable with the number of observations nn. We treat the respective regions where the functional is "nonsmooth" and "smooth" separately. In the "nonsmooth" regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the "smooth" regime, we apply a bias-corrected Maximum Likelihood Estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing two important cases: the entropy H(P)=i=1SpilnpiH(P) = \sum_{i = 1}^S -p_i \ln p_i and Fα(P)=i=1Spiα,α>0F_\alpha(P) = \sum_{i = 1}^S p_i^\alpha,\alpha>0. We obtain the minimax L2L_2 rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity nS/lnSn \asymp S/\ln S for entropy estimation. We also show that the sample complexity for estimating Fα(P),0<α<1F_\alpha(P),0<\alpha<1 is nS1/α/lnSn\asymp S^{1/\alpha}/ \ln S, which can be achieved by our estimator but not the MLE. For 1<α<3/21<\alpha<3/2, we show the minimax L2L_2 rate for estimating Fα(P)F_\alpha(P) is (nlnn)2(α1)(n\ln n)^{-2(\alpha-1)} regardless of the alphabet size, while the L2L_2 rate for the MLE is n2(α1)n^{-2(\alpha-1)}. For all the above cases, the behavior of the minimax rate-optimal estimators with nn samples is essentially that of the MLE with nlnnn\ln n samples. We highlight the practical advantages of our schemes for entropy and mutual information estimation. We demonstrate that our approach reduces running time and boosts the accuracy compared to existing various approaches. Moreover, we show that the mutual information estimator induced by our methodology leads to significant performance boosts over the Chow--Liu algorithm in learning graphical models.Comment: To appear in IEEE Transactions on Information Theor

    Methods for Estimation of Convex Sets

    Full text link
    In the framework of shape constrained estimation, we review methods and works done in convex set estimation. These methods mostly build on stochastic and convex geometry, empirical process theory, functional analysis, linear programming, extreme value theory, etc. The statistical problems that we review include density support estimation, estimation of the level sets of densities or depth functions, nonparametric regression, etc. We focus on the estimation of convex sets under the Nikodym and Hausdorff metrics, which require different techniques and, quite surprisingly, lead to very different results, in particular in density support estimation. Finally, we discuss computational issues in high dimensions.Comment: 29 page

    Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

    Full text link
    We study the minimax estimation of α\alpha-divergences between discrete distributions for integer α1\alpha\ge 1, which include the Kullback--Leibler divergence and the χ2\chi^2-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.Comment: This version has been significantly revise

    Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

    Full text link
    We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. We construct an efficiently computable estimator that achieves the minimax rates in estimating the distribution up to permutation, and show that the plug-in approach of our unlabeled distribution estimator is "universal" in estimating symmetric functionals of discrete distributions. Instead of doing best polynomial approximation explicitly as in existing literature of functional estimation, the plug-in approach conducts polynomial approximation implicitly and attains the optimal sample complexity for the entropy, power sum and support size functionals

    Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery

    Full text link
    Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with high-dimensionality are also discussed.Comment: 2 figure

    Hypotheses tests in boundary regression models

    Full text link
    Consider a nonparametric regression model with one-sided errors and regression function in a general H\"older class. We estimate the regression function via minimization of the local integral of a polynomial approximation. We show uniform rates of convergence for the simple regression estimator as well as for a smooth version. These rates carry over to mean regression models with a symmetric and bounded error distribution. In such a setting, one obtains faster rates for irregular error distributions concentrating sufficient mass near the endpoints than for the usual regular distributions. The results are applied to prove asymptotic n\sqrt{n}-equivalence of a residual-based (sequential) empirical distribution function to the (sequential) empirical distribution function of unobserved errors in the case of irregular error distributions. This result is remarkably different from corresponding results in mean regression with regular errors. It can readily be applied to develop goodness-of-fit tests for the error distribution. We present some examples and investigate the small sample performance in a simulation study. We further discuss asymptotically distribution-free hypotheses tests for independence of the error distribution from the points of measurement and for monotonicity of the boundary function as well

    Learning Multivariate Log-concave Distributions

    Full text link
    We study the problem of estimating multivariate log-concave probability density functions. We prove the first sample complexity upper bound for learning log-concave densities on Rd\mathbb{R}^d, for all d1d \geq 1. Prior to our work, no upper bound on the sample complexity of this learning problem was known for the case of d>3d>3. In more detail, we give an estimator that, for any d1d \ge 1 and ϵ>0\epsilon>0, draws O~d((1/ϵ)(d+5)/2)\tilde{O}_d \left( (1/\epsilon)^{(d+5)/2} \right) samples from an unknown target log-concave density on Rd\mathbb{R}^d, and outputs a hypothesis that (with high probability) is ϵ\epsilon-close to the target, in total variation distance. Our upper bound on the sample complexity comes close to the known lower bound of Ωd((1/ϵ)(d+1)/2)\Omega_d \left( (1/\epsilon)^{(d+1)/2} \right) for this problem.Comment: To appear in COLT 201

    A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms

    Full text link
    This paper proposes a new approach to construct high quality space-filling sample designs. First, we propose a novel technique to quantify the space-filling property and optimally trade-off uniformity and randomness in sample designs in arbitrary dimensions. Second, we connect the proposed metric (defined in the spatial domain) to the objective measure of the design performance (defined in the spectral domain). This connection serves as an analytic framework for evaluating the qualitative properties of space-filling designs in general. Using the theoretical insights provided by this spatial-spectral analysis, we derive the notion of optimal space-filling designs, which we refer to as space-filling spectral designs. Third, we propose an efficient estimator to evaluate the space-filling properties of sample designs in arbitrary dimensions and use it to develop an optimization framework to generate high quality space-filling designs. Finally, we carry out a detailed performance comparison on two different applications in 2 to 6 dimensions: a) image reconstruction and b) surrogate modeling on several benchmark optimization functions and an inertial confinement fusion (ICF) simulation code. We demonstrate that the propose spectral designs significantly outperform existing approaches especially in high dimensions
    corecore