66 research outputs found
An Extremal Inequality for Long Markov Chains
Let be jointly Gaussian vectors, and consider random variables
that satisfy the Markov constraint . We prove an extremal inequality
relating the mutual informations between all pairs of random
variables from the set . As a first application, we show that the
rate region for the two-encoder quadratic Gaussian source coding problem
follows as an immediate corollary of the the extremal inequality. In a second
application, we establish the rate region for a vector-Gaussian source coding
problem where L\"{o}wner-John ellipsoids are approximated based on
rate-constrained descriptions of the data.Comment: 18 pages, 1 figure. Submitted to Transactions on Information Theor
Bias Correction with Jackknife, Bootstrap, and Taylor Series
We analyze bias correction methods using jackknife, bootstrap, and Taylor
series. We focus on the binomial model, and consider the problem of bias
correction for estimating , where is arbitrary. We
characterize the supremum norm of the bias of general jackknife and bootstrap
estimators for any continuous functions, and demonstrate the in delete-
jackknife, different values of may lead to drastically different behaviors
in jackknife. We show that in the binomial model, iterating the bootstrap bias
correction infinitely many times may lead to divergence of bias and variance,
and demonstrate that the bias properties of the bootstrap bias corrected
estimator after rounds are of the same order as that of the -jackknife
estimator if a bounded coefficients condition is satisfied.Comment: to appear in IEEE Transactions on Information Theor
Deconstructing Generative Adversarial Networks
We deconstruct the performance of GANs into three components:
1. Formulation: we propose a perturbation view of the population target of
GANs. Building on this interpretation, we show that GANs can be viewed as a
generalization of the robust statistics framework, and propose a novel GAN
architecture, termed as Cascade GANs, to provably recover meaningful
low-dimensional generator approximations when the real distribution is
high-dimensional and corrupted by outliers.
2. Generalization: given a population target of GANs, we design a systematic
principle, projection under admissible distance, to design GANs to meet the
population requirement using finite samples. We implement our principle in
three cases to achieve polynomial and sometimes near-optimal sample
complexities: (1) learning an arbitrary generator under an arbitrary
pseudonorm; (2) learning a Gaussian location family under TV distance, where we
utilize our principle provide a new proof for the optimality of Tukey median
viewed as GANs; (3) learning a low-dimensional Gaussian approximation of a
high-dimensional arbitrary distribution under Wasserstein distance. We
demonstrate a fundamental trade-off in the approximation error and statistical
error in GANs, and show how to apply our principle with empirical samples to
predict how many samples are sufficient for GANs in order not to suffer from
the discriminator winning problem.
3. Optimization: we demonstrate alternating gradient descent is provably not
locally asymptotically stable in optimizing the GAN formulation of PCA. We
diagnose the problem as the minimax duality gap being non-zero, and propose a
new GAN architecture whose duality gap is zero, where the value of the game is
equal to the previous minimax value (not the maximin value). We prove the new
GAN architecture is globally asymptotically stable in optimization under
alternating gradient descent
Minimax Estimation of the Distance
We consider the problem of estimating the distance between two discrete
probability measures and from empirical data in a nonasymptotic and
large alphabet setting. When is known and one obtains samples from ,
we show that for every , the minimax rate-optimal estimator with samples
achieves performance comparable to that of the maximum likelihood estimator
(MLE) with samples. When both and are unknown, we construct
minimax rate-optimal estimators whose worst case performance is essentially
that of the known case with being uniform, implying that being
uniform is essentially the most difficult case. The \emph{effective sample size
enlargement} phenomenon, identified in Jiao \emph{et al.} (2015), holds both in
the known case for every and the unknown case. However, the
construction of optimal estimators for requires new techniques and
insights beyond the approximation-based method of functional estimation in Jiao
\emph{et al.} (2015).Comment: to appear on IEEE Transactions on Information Theor
On Estimation of -Norms in Gaussian White Noise Models
We provide a complete picture of asymptotically minimax estimation of
-norms (for any ) of the mean in Gaussian white noise model over
Nikolskii-Besov spaces. In this regard, we complement the work of Lepski,
Nemirovski and Spokoiny (1999), who considered the cases of (with
poly-logarithmic gap between upper and lower bounds) and even (with
asymptotically sharp upper and lower bounds) over H\"{o}lder spaces. We
additionally consider the case of asymptotically adaptive minimax estimation
and demonstrate a difference between even and non-even in terms of an
investigator's ability to produce asymptotically adaptive minimax estimators
without paying a penalty.Comment: To appear in Probability Theory and Related Field
Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance
We present \emph{Local Moment Matching (LMM)}, a unified methodology for
symmetric functional estimation and distribution estimation under Wasserstein
distance. We construct an efficiently computable estimator that achieves the
minimax rates in estimating the distribution up to permutation, and show that
the plug-in approach of our unlabeled distribution estimator is "universal" in
estimating symmetric functionals of discrete distributions. Instead of doing
best polynomial approximation explicitly as in existing literature of
functional estimation, the plug-in approach conducts polynomial approximation
implicitly and attains the optimal sample complexity for the entropy, power sum
and support size functionals
Relations between Information and Estimation in Discrete-Time L\'evy Channels
Fundamental relations between information and estimation have been
established in the literature for the discrete-time Gaussian and Poisson
channels. In this work, we demonstrate that such relations hold for a much
larger class of observation models. We introduce the natural family of
discrete-time L\'evy channels where the distribution of the output conditioned
on the input is infinitely divisible. For L\'evy channels, we establish new
representations relating the mutual information between the channel input and
output to an optimal expected estimation loss, thereby unifying and
considerably extending results from the Gaussian and Poisson settings. We
demonstrate the richness of our results by working out two examples of L\'evy
channels, namely the gamma channel and the negative binomial channel, with
corresponding relations between information and estimation. Extensions to the
setting of mismatched estimation are also presented
Minimax Estimation of Discrete Distributions under Loss
We analyze the problem of discrete distribution estimation under
loss. We provide non-asymptotic upper and lower bounds on the maximum risk of
the empirical distribution (the maximum likelihood estimator), and the minimax
risk in regimes where the alphabet size may grow with the number of
observations . We show that among distributions with bounded entropy ,
the asymptotic maximum risk for the empirical distribution is , while
the asymptotic minimax risk is . Moreover, Moreover, we show that a
hard-thresholding estimator oblivious to the unknown upper bound , is
asymptotically minimax. However, if we constrain the estimates to lie in the
simplex of probability distributions, then the asymptotic minimax risk is again
. We draw connections between our work and the literature on density
estimation, entropy estimation, total variation distance ( divergence)
estimation, joint distribution estimation in stochastic processes, normal mean
estimation, and adaptive estimation
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal
We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the
differential entropy. We obtain the first uniform upper bound on its
performance over H\"older balls on a torus without assuming any conditions on
how close the density could be from zero. Accompanying a new minimax lower
bound over the H\"older ball, we show that the KL estimator is achieving the
minimax rates up to logarithmic factors without cognizance of the smoothness
parameter of the H\"older ball for and arbitrary dimension
, rendering it the first estimator that provably satisfies this property
Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions
We study the minimax estimation of -divergences between discrete
distributions for integer , which include the Kullback--Leibler
divergence and the -divergences as special examples. Dropping the usual
theoretical tricks to acquire independence, we construct the first minimax
rate-optimal estimator which does not require any Poissonization, sample
splitting, or explicit construction of approximating polynomials. The estimator
uses a hybrid approach which solves a problem-independent linear program based
on moment matching in the non-smooth regime, and applies a problem-dependent
bias-corrected plug-in estimator in the smooth regime, with a soft decision
boundary between these regimes.Comment: This version has been significantly revise
- β¦