6,363 research outputs found
The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions
Suppose a string generated by a memoryless source
with distribution is to be compressed with distortion no
greater than , using a memoryless random codebook with distribution
. The compression performance is determined by the ``generalized asymptotic
equipartition property'' (AEP), which states that the probability of finding a
-close match between and any given codeword , is
approximately , where the rate function can be
expressed as an infimum of relative entropies. The main purpose here is to
remove various restrictive assumptions on the validity of this result that have
appeared in the recent literature. Necessary and sufficient conditions for the
generalized AEP are provided in the general setting of abstract alphabets and
unbounded distortion measures. All possible distortion levels are
considered; the source can be stationary and ergodic; and the
codebook distribution can have memory. Moreover, the behavior of the matching
probability is precisely characterized, even when the generalized AEP is not
valid. Natural characterizations of the rate function are
established under equally general conditions.Comment: 19 page
Conservative Hypothesis Tests and Confidence Intervals using Importance Sampling
Importance sampling is a common technique for Monte Carlo approximation,
including Monte Carlo approximation of p-values. Here it is shown that a simple
correction of the usual importance sampling p-values creates valid p-values,
meaning that a hypothesis test created by rejecting the null when the p-value
is <= alpha will also have a type I error rate <= alpha. This correction uses
the importance weight of the original observation, which gives valuable
diagnostic information under the null hypothesis. Using the corrected p-values
can be crucial for multiple testing and also in problems where evaluating the
accuracy of importance sampling approximations is difficult. Inverting the
corrected p-values provides a useful way to create Monte Carlo confidence
intervals that maintain the nominal significance level and use only a single
Monte Carlo sample. Several applications are described, including accelerated
multiple testing for a large neurophysiological dataset and exact conditional
inference for a logistic regression model with nuisance parameters.Comment: 26 pages, 3 figures, 3 tables [significant rewrite of version 1,
including additional examples, title change
Exact Enumeration and Sampling of Matrices with Specified Margins
We describe a dynamic programming algorithm for exact counting and exact
uniform sampling of matrices with specified row and column sums. The algorithm
runs in polynomial time when the column sums are bounded. Binary or
non-negative integer matrices are handled. The method is distinguished by
applicability to non-regular margins, tractability on large matrices, and the
capacity for exact sampling
Inconsistency of Pitman-Yor process mixtures for the number of components
In many applications, a finite mixture is a natural model, but it can be
difficult to choose an appropriate number of components. To circumvent this
choice, investigators are increasingly turning to Dirichlet process mixtures
(DPMs), and Pitman-Yor process mixtures (PYMs), more generally. While these
models may be well-suited for Bayesian density estimation, many investigators
are using them for inferences about the number of components, by considering
the posterior on the number of components represented in the observed data. We
show that this posterior is not consistent --- that is, on data from a finite
mixture, it does not concentrate at the true number of components. This result
applies to a large class of nonparametric mixtures, including DPMs and PYMs,
over a wide variety of families of component distributions, including
essentially all discrete families, as well as continuous exponential families
satisfying mild regularity conditions (such as multivariate Gaussians).Comment: This is a general treatment of the problem discussed in our related
article, "A simple example of Dirichlet process mixture inconsistency for the
number of components", Miller and Harrison (2013) arXiv:1301.270
- β¦