3,069 research outputs found
A Novel Nonparametric Distance Estimator for Densities with Error Bounds
The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an a-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized Tsallis divergences and entropies. The nonparametric Parzen's density estimator emerges as a natural candidate to estimate the underlying probability density function, since it may account for data from different groups, or experiments with distinct instrumental precisions, i.e., non-independent and identically distributed (non-i.i.d.) data. However, the information theoretic derived metric of the nonparametric Parzen's density estimator displays infinite variance, limiting the direct use of resampling estimators. Based on measure theory, we present a change of measure to build a finite variance density allowing the use of resampling estimators. In order to counteract the poor scaling with dimension, we propose a new nonparametric two-stage robust resampling estimator of Hellinger's metric error bounds for heterocedastic data. The approach presents very promising results allowing the use of different covariances for different clusters with impact on the distance evaluation
Meta learning of bounds on the Bayes classifier error
Meta learning uses information from base learners (e.g. classifiers or
estimators) as well as information about the learning problem to improve upon
the performance of a single base learner. For example, the Bayes error rate of
a given feature space, if known, can be used to aid in choosing a classifier,
as well as in feature selection and model selection for the base classifiers
and the meta classifier. Recent work in the field of f-divergence functional
estimation has led to the development of simple and rapidly converging
estimators that can be used to estimate various bounds on the Bayes error. We
estimate multiple bounds on the Bayes error using an estimator that applies
meta learning to slowly converging plug-in estimators to obtain the parametric
convergence rate. We compare the estimated bounds empirically on simulated data
and then estimate the tighter bounds on features extracted from an image patch
analysis of sunspot continuum and magnetogram images.Comment: 6 pages, 3 figures, to appear in proceedings of 2015 IEEE Signal
Processing and SP Education Worksho
Variational Analysis of Constrained M-Estimators
We propose a unified framework for establishing existence of nonparametric
M-estimators, computing the corresponding estimates, and proving their strong
consistency when the class of functions is exceptionally rich. In particular,
the framework addresses situations where the class of functions is complex
involving information and assumptions about shape, pointwise bounds, location
of modes, height at modes, location of level-sets, values of moments, size of
subgradients, continuity, distance to a "prior" function, multivariate total
positivity, and any combination of the above. The class might be engineered to
perform well in a specific setting even in the presence of little data. The
framework views the class of functions as a subset of a particular metric space
of upper semicontinuous functions under the Attouch-Wets distance. In addition
to allowing a systematic treatment of numerous M-estimators, the framework
yields consistency of plug-in estimators of modes of densities, maximizers of
regression functions, level-sets of classifiers, and related quantities, and
also enables computation by means of approximating parametric classes. We
establish consistency through a one-sided law of large numbers, here extended
to sieves, that relaxes assumptions of uniform laws, while ensuring global
approximations even under model misspecification
Private Graphon Estimation for Sparse Graphs
We design algorithms for fitting a high-dimensional statistical model to a
large, sparse network without revealing sensitive information of individual
members. Given a sparse input graph , our algorithms output a
node-differentially-private nonparametric block model approximation. By
node-differentially-private, we mean that our output hides the insertion or
removal of a vertex and all its adjacent edges. If is an instance of the
network obtained from a generative nonparametric model defined in terms of a
graphon , our model guarantees consistency, in the sense that as the number
of vertices tends to infinity, the output of our algorithm converges to in
an appropriate version of the norm. In particular, this means we can
estimate the sizes of all multi-way cuts in .
Our results hold as long as is bounded, the average degree of grows
at least like the log of the number of vertices, and the number of blocks goes
to infinity at an appropriate rate. We give explicit error bounds in terms of
the parameters of the model; in several settings, our bounds improve on or
match known nonprivate results.Comment: 36 page
Demystifying Fixed k-Nearest Neighbor Information Estimators
Estimating mutual information from i.i.d. samples drawn from an unknown joint
density function is a basic statistical problem of broad interest with
multitudinous applications. The most popular estimator is one proposed by
Kraskov and St\"ogbauer and Grassberger (KSG) in 2004, and is nonparametric and
based on the distances of each sample to its nearest neighboring
sample, where is a fixed small integer. Despite its widespread use (part of
scientific software packages), theoretical properties of this estimator have
been largely unexplored. In this paper we demonstrate that the estimator is
consistent and also identify an upper bound on the rate of convergence of the
bias as a function of number of samples. We argue that the superior performance
benefits of the KSG estimator stems from a curious "correlation boosting"
effect and build on this intuition to modify the KSG estimator in novel ways to
construct a superior estimator. As a byproduct of our investigations, we obtain
nearly tight rates of convergence of the error of the well known fixed
nearest neighbor estimator of differential entropy by Kozachenko and
Leonenko.Comment: 55 pages, 8 figure
- …