51,315 research outputs found
The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions
Suppose a string generated by a memoryless source
with distribution is to be compressed with distortion no
greater than , using a memoryless random codebook with distribution
. The compression performance is determined by the ``generalized asymptotic
equipartition property'' (AEP), which states that the probability of finding a
-close match between and any given codeword , is
approximately , where the rate function can be
expressed as an infimum of relative entropies. The main purpose here is to
remove various restrictive assumptions on the validity of this result that have
appeared in the recent literature. Necessary and sufficient conditions for the
generalized AEP are provided in the general setting of abstract alphabets and
unbounded distortion measures. All possible distortion levels are
considered; the source can be stationary and ergodic; and the
codebook distribution can have memory. Moreover, the behavior of the matching
probability is precisely characterized, even when the generalized AEP is not
valid. Natural characterizations of the rate function are
established under equally general conditions.Comment: 19 page
Conservative Hypothesis Tests and Confidence Intervals using Importance Sampling
Importance sampling is a common technique for Monte Carlo approximation,
including Monte Carlo approximation of p-values. Here it is shown that a simple
correction of the usual importance sampling p-values creates valid p-values,
meaning that a hypothesis test created by rejecting the null when the p-value
is <= alpha will also have a type I error rate <= alpha. This correction uses
the importance weight of the original observation, which gives valuable
diagnostic information under the null hypothesis. Using the corrected p-values
can be crucial for multiple testing and also in problems where evaluating the
accuracy of importance sampling approximations is difficult. Inverting the
corrected p-values provides a useful way to create Monte Carlo confidence
intervals that maintain the nominal significance level and use only a single
Monte Carlo sample. Several applications are described, including accelerated
multiple testing for a large neurophysiological dataset and exact conditional
inference for a logistic regression model with nuisance parameters.Comment: 26 pages, 3 figures, 3 tables [significant rewrite of version 1,
including additional examples, title change
Estimation of the Rate-Distortion Function
Motivated by questions in lossy data compression and by theoretical
considerations, we examine the problem of estimating the rate-distortion
function of an unknown (not necessarily discrete-valued) source from empirical
data. Our focus is the behavior of the so-called "plug-in" estimator, which is
simply the rate-distortion function of the empirical distribution of the
observed data. Sufficient conditions are given for its consistency, and
examples are provided to demonstrate that in certain cases it fails to converge
to the true rate-distortion function. The analysis of its performance is
complicated by the fact that the rate-distortion function is not continuous in
the source distribution; the underlying mathematical problem is closely related
to the classical problem of establishing the consistency of maximum likelihood
estimators. General consistency results are given for the plug-in estimator
applied to a broad class of sources, including all stationary and ergodic ones.
A more general class of estimation problems is also considered, arising in the
context of lossy data compression when the allowed class of coding
distributions is restricted; analogous results are developed for the plug-in
estimator in that case. Finally, consistency theorems are formulated for
modified (e.g., penalized) versions of the plug-in, and for estimating the
optimal reproduction distribution.Comment: 18 pages, no figures [v2: removed an example with an error; corrected
typos; a shortened version will appear in IEEE Trans. Inform. Theory
Exact Enumeration and Sampling of Matrices with Specified Margins
We describe a dynamic programming algorithm for exact counting and exact
uniform sampling of matrices with specified row and column sums. The algorithm
runs in polynomial time when the column sums are bounded. Binary or
non-negative integer matrices are handled. The method is distinguished by
applicability to non-regular margins, tractability on large matrices, and the
capacity for exact sampling
Backing the horse or the jockey? Due diligence, agency costs, information and the evaluation of risk by business angel investors
This paper explores the argument that business angel investors are more concerned with managing and minimising agency risk than market risk. Based on data on the due diligence process from a survey of business angels in the UK, the paper concludes that business angels do view entrepreneur characteristics and experience as having the greatest impact on the perceived riskiness of an investment opportunity. Further, they emphasise personal and informal over formal sources of information in the due diligence process, and seek information on both the entrepreneur and the venture in determining valuation. Indeed, the reliance of business angels on short-term and subjective information to value investment opportunities leads to the conclusion that their approach to valuation is not a function of the conventional protocols of financial analysis, but of personal relations and assessment
Asymptotic equivalence and adaptive estimation for robust nonparametric regression
Asymptotic equivalence theory developed in the literature so far are only for
bounded loss functions. This limits the potential applications of the theory
because many commonly used loss functions in statistical inference are
unbounded. In this paper we develop asymptotic equivalence results for robust
nonparametric regression with unbounded loss functions. The results imply that
all the Gaussian nonparametric regression procedures can be robustified in a
unified way. A key step in our equivalence argument is to bin the data and then
take the median of each bin. The asymptotic equivalence results have
significant practical implications. To illustrate the general principles of the
equivalence argument we consider two important nonparametric inference
problems: robust estimation of the regression function and the estimation of a
quadratic functional. In both cases easily implementable procedures are
constructed and are shown to enjoy simultaneously a high degree of robustness
and adaptivity. Other problems such as construction of confidence sets and
nonparametric hypothesis testing can be handled in a similar fashion.Comment: Published in at http://dx.doi.org/10.1214/08-AOS681 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- β¦