    Nonsubjective priors via predictive relative entropy regret

    We explore the construction of nonsubjective prior distributions in Bayesian statistics via a posterior predictive relative entropy regret criterion. We carry out a minimax analysis based on a derived asymptotic predictive loss function and show that this approach to prior construction has a number of attractive features. The approach here differs from previous work that uses either prior or posterior relative entropy regret in that we consider predictive performance in relation to alternative nondegenerate prior distributions. The theory is illustrated with an analysis of some specific examples.Comment: Published at http://dx.doi.org/10.1214/009053605000000804 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory

    We describe and develop a close relationship between two problems that have customarily been regarded as distinct: that of maximizing entropy, and that of minimizing worst-case expected loss. Using a formulation grounded in the equilibrium theory of zero-sum games between Decision Maker and Nature, these two problems are shown to be dual to each other, the solution to each providing that to the other. Although Tops\oe described this connection for the Shannon entropy over 20 years ago, it does not appear to be widely known even in that important special case. We here generalize this theory to apply to arbitrary decision problems and loss functions. We indicate how an appropriate generalized definition of entropy can be associated with such a problem, and we show that, subject to certain regularity conditions, the above-mentioned duality continues to apply in this extended context. This simultaneously provides a possible rationale for maximizing entropy and a tool for finding robust Bayes acts. We also describe the essential identity between the problem of maximizing entropy and that of minimizing a related discrepancy or divergence between distributions. This leads to an extension, to arbitrary discrepancies, of a well-known minimax theorem for the case of Kullback-Leibler divergence (the ``redundancy-capacity theorem'' of information theory). For the important case of families of distributions having certain mean values specified, we develop simple sufficient conditions and methods for identifying the desired solutions.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000055

    A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

    We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posteriorprior)\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior}) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P)L_2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with LL_\infty. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page

    Law of Log Determinant of Sample Covariance Matrix and Optimal Estimation of Differential Entropy for High-Dimensional Gaussian Distributions

    Differential entropy and log determinant of the covariance matrix of a multivariate Gaussian distribution have many applications in coding, communications, signal processing and statistical inference. In this paper we consider in the high dimensional setting optimal estimation of the differential entropy and the log-determinant of the covariance matrix. We first establish a central limit theorem for the log determinant of the sample covariance matrix in the high dimensional setting where the dimension p(n)p(n) can grow with the sample size nn. An estimator of the differential entropy and the log determinant is then considered. Optimal rate of convergence is obtained. It is shown that in the case p(n)/n0p(n)/n \rightarrow 0 the estimator is asymptotically sharp minimax. The ultra-high dimensional setting where p(n)>np(n) > n is also discussed.Comment: 19 page

    Mismatched Quantum Filtering and Entropic Information

    Quantum filtering is a signal processing technique that estimates the posterior state of a quantum system under continuous measurements and has become a standard tool in quantum information processing, with applications in quantum state preparation, quantum metrology, and quantum control. If the filter assumes a nominal model that differs from reality, however, the estimation accuracy is bound to suffer. Here I derive identities that relate the excess error caused by quantum filter mismatch to the relative entropy between the true and nominal observation probability measures, with one identity for Gaussian measurements, such as optical homodyne detection, and another for Poissonian measurements, such as photon counting. These identities generalize recent seminal results in classical information theory and provide new operational meanings to relative entropy, mutual information, and channel capacity in the context of quantum experiments.Comment: v1: first draft, 8 pages, v2: added introduction and more results on mutual information and channel capacity, 12 pages, v3: minor updates, v4: updated the presentatio

    Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes

    This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent α\alpha. The minimax redundancy of exponentially decreasing envelope classes is proved to be equivalent to 14αlogelog2n\frac{1}{4 \alpha \log e} \log^2 n. Then a coding strategy is proposed, with a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive algorithm is provided, whose redundancy is equivalent to the minimax redundanc

    Robust Hypothesis Testing with a Relative Entropy Tolerance

    This paper considers the design of a minimax test for two hypotheses where the actual probability densities of the observations are located in neighborhoods obtained by placing a bound on the relative entropy between actual and nominal densities. The minimax problem admits a saddle point which is characterized. The robust test applies a nonlinear transformation which flattens the nominal likelihood ratio in the vicinity of one. Results are illustrated by considering the transmission of binary data in the presence of additive noise.Comment: 14 pages, 5 figures, submitted to the IEEE Transactions on Information Theory, July 2007, revised April 200