2,417 research outputs found
Minimax rates of entropy estimation on large alphabets via best polynomial approximation
Consider the problem of estimating the Shannon entropy of a distribution over
elements from independent samples. We show that the minimax mean-square
error is within universal multiplicative constant factors of if exceeds a constant factor of
; otherwise there exists no consistent estimator. This
refines the recent result of Valiant-Valiant \cite{VV11} that the minimal
sample size for consistent entropy estimation scales according to
. The apparatus of best polynomial approximation
plays a key role in both the construction of optimal estimators and, via a
duality argument, the minimax lower bound
Nonparametric density estimation by histogram trend filtering
We propose a novel approach for density estimation called histogram trend
filtering. Our estimator arises from looking at surrogate Poisson model for
counts of observations in a partition of the support of the data. We begin by
showing consistency for a variational estimator for this density estimation
problem. We then study a discrete estimator that can be efficiently found via
convex optimization. We show that the estimator enjoys strong statistical
guarantees, yet is much more practical and computationally efficient than other
estimators that enjoy similar guarantees. Finally, in our simulation study the
proposed method showed smaller averaged mean square error than competing
methods. This favorable blend of properties makes histogram trend filtering an
ideal candidate for use in routine data-analysis applications that call for a
quick, efficient, accurate density estimate
Minimax Estimation of Functionals of Discrete Distributions
We propose a general methodology for the construction and analysis of minimax
estimators for a wide class of functionals of finite dimensional parameters,
and elaborate on the case of discrete distributions, where the alphabet size
is unknown and may be comparable with the number of observations . We
treat the respective regions where the functional is "nonsmooth" and "smooth"
separately. In the "nonsmooth" regime, we apply an unbiased estimator for the
best polynomial approximation of the functional whereas, in the "smooth"
regime, we apply a bias-corrected Maximum Likelihood Estimator (MLE). We
illustrate the merit of this approach by thoroughly analyzing two important
cases: the entropy and . We obtain the minimax rates for
estimating these functionals. In particular, we demonstrate that our estimator
achieves the optimal sample complexity for entropy
estimation. We also show that the sample complexity for estimating
is , which can be
achieved by our estimator but not the MLE. For , we show the
minimax rate for estimating is
regardless of the alphabet size, while the rate for the MLE is
. For all the above cases, the behavior of the minimax
rate-optimal estimators with samples is essentially that of the MLE with
samples. We highlight the practical advantages of our schemes for
entropy and mutual information estimation. We demonstrate that our approach
reduces running time and boosts the accuracy compared to existing various
approaches. Moreover, we show that the mutual information estimator induced by
our methodology leads to significant performance boosts over the Chow--Liu
algorithm in learning graphical models.Comment: To appear in IEEE Transactions on Information Theor
Methods for Estimation of Convex Sets
In the framework of shape constrained estimation, we review methods and works
done in convex set estimation. These methods mostly build on stochastic and
convex geometry, empirical process theory, functional analysis, linear
programming, extreme value theory, etc. The statistical problems that we review
include density support estimation, estimation of the level sets of densities
or depth functions, nonparametric regression, etc. We focus on the estimation
of convex sets under the Nikodym and Hausdorff metrics, which require different
techniques and, quite surprisingly, lead to very different results, in
particular in density support estimation. Finally, we discuss computational
issues in high dimensions.Comment: 29 page
Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions
We study the minimax estimation of -divergences between discrete
distributions for integer , which include the Kullback--Leibler
divergence and the -divergences as special examples. Dropping the usual
theoretical tricks to acquire independence, we construct the first minimax
rate-optimal estimator which does not require any Poissonization, sample
splitting, or explicit construction of approximating polynomials. The estimator
uses a hybrid approach which solves a problem-independent linear program based
on moment matching in the non-smooth regime, and applies a problem-dependent
bias-corrected plug-in estimator in the smooth regime, with a soft decision
boundary between these regimes.Comment: This version has been significantly revise
Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance
We present \emph{Local Moment Matching (LMM)}, a unified methodology for
symmetric functional estimation and distribution estimation under Wasserstein
distance. We construct an efficiently computable estimator that achieves the
minimax rates in estimating the distribution up to permutation, and show that
the plug-in approach of our unlabeled distribution estimator is "universal" in
estimating symmetric functionals of discrete distributions. Instead of doing
best polynomial approximation explicitly as in existing literature of
functional estimation, the plug-in approach conducts polynomial approximation
implicitly and attains the optimal sample complexity for the entropy, power sum
and support size functionals
Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery
Technological innovations have revolutionized the process of scientific
research and knowledge discovery. The availability of massive data and
challenges from frontiers of research and development have reshaped statistical
thinking, data analysis and theoretical studies. The challenges of
high-dimensionality arise in diverse fields of sciences and the humanities,
ranging from computational biology and health studies to financial engineering
and risk management. In all of these fields, variable selection and feature
extraction are crucial for knowledge discovery. We first give a comprehensive
overview of statistical challenges with high dimensionality in these diverse
disciplines. We then approach the problem of variable selection and feature
extraction using a unified framework: penalized likelihood methods. Issues
relevant to the choice of penalty functions are addressed. We demonstrate that
for a host of statistical problems, as long as the dimensionality is not
excessively large, we can estimate the model parameters as well as if the best
model is known in advance. The persistence property in risk minimization is
also addressed. The applicability of such a theory and method to diverse
statistical problems is demonstrated. Other related problems with
high-dimensionality are also discussed.Comment: 2 figure
Hypotheses tests in boundary regression models
Consider a nonparametric regression model with one-sided errors and
regression function in a general H\"older class. We estimate the regression
function via minimization of the local integral of a polynomial approximation.
We show uniform rates of convergence for the simple regression estimator as
well as for a smooth version. These rates carry over to mean regression models
with a symmetric and bounded error distribution. In such a setting, one obtains
faster rates for irregular error distributions concentrating sufficient mass
near the endpoints than for the usual regular distributions. The results are
applied to prove asymptotic -equivalence of a residual-based
(sequential) empirical distribution function to the (sequential) empirical
distribution function of unobserved errors in the case of irregular error
distributions. This result is remarkably different from corresponding results
in mean regression with regular errors. It can readily be applied to develop
goodness-of-fit tests for the error distribution. We present some examples and
investigate the small sample performance in a simulation study. We further
discuss asymptotically distribution-free hypotheses tests for independence of
the error distribution from the points of measurement and for monotonicity of
the boundary function as well
Learning Multivariate Log-concave Distributions
We study the problem of estimating multivariate log-concave probability
density functions. We prove the first sample complexity upper bound for
learning log-concave densities on , for all . Prior to
our work, no upper bound on the sample complexity of this learning problem was
known for the case of . In more detail, we give an estimator that, for any
and , draws samples from an unknown target log-concave density on ,
and outputs a hypothesis that (with high probability) is -close to
the target, in total variation distance. Our upper bound on the sample
complexity comes close to the known lower bound of for this problem.Comment: To appear in COLT 201
A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms
This paper proposes a new approach to construct high quality space-filling
sample designs. First, we propose a novel technique to quantify the
space-filling property and optimally trade-off uniformity and randomness in
sample designs in arbitrary dimensions. Second, we connect the proposed metric
(defined in the spatial domain) to the objective measure of the design
performance (defined in the spectral domain). This connection serves as an
analytic framework for evaluating the qualitative properties of space-filling
designs in general. Using the theoretical insights provided by this
spatial-spectral analysis, we derive the notion of optimal space-filling
designs, which we refer to as space-filling spectral designs. Third, we propose
an efficient estimator to evaluate the space-filling properties of sample
designs in arbitrary dimensions and use it to develop an optimization framework
to generate high quality space-filling designs. Finally, we carry out a
detailed performance comparison on two different applications in 2 to 6
dimensions: a) image reconstruction and b) surrogate modeling on several
benchmark optimization functions and an inertial confinement fusion (ICF)
simulation code. We demonstrate that the propose spectral designs significantly
outperform existing approaches especially in high dimensions
- …