937 research outputs found

    On surrogate loss functions and ff-divergences

    Full text link
    The goal of binary classification is to estimate a discriminant function γ\gamma from observations of covariate vectors and corresponding binary labels. We consider an elaboration of this problem in which the covariates are not available directly but are transformed by a dimensionality-reducing quantizer QQ. We present conditions on loss functions such that empirical risk minimization yields Bayes consistency when both the discriminant function and the quantizer are estimated. These conditions are stated in terms of a general correspondence between loss functions and a class of functionals known as Ali-Silvey or ff-divergence functionals. Whereas this correspondence was established by Blackwell [Proc. 2nd Berkeley Symp. Probab. Statist. 1 (1951) 93--102. Univ. California Press, Berkeley] for the 0--1 loss, we extend the correspondence to the broader class of surrogate loss functions that play a key role in the general theory of Bayes consistency for binary classification. Our result makes it possible to pick out the (strict) subset of surrogate loss functions that yield Bayes consistency for joint estimation of the discriminant function and the quantizer.Comment: Published in at http://dx.doi.org/10.1214/08-AOS595 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Information-theoretic limitations of distributed information processing

    Get PDF
    In a generic distributed information processing system, a number of agents connected by communication channels aim to accomplish a task collectively through local communications. The fundamental limits of distributed information processing problems depend not only on the intrinsic difficulty of the task, but also on the communication constraints due to the distributedness. In this thesis, we reveal these dependencies quantitatively under information-theoretic frameworks. We consider three typical distributed information processing problems: decentralized parameter estimation, distributed function computation, and statistical learning under adaptive composition. For the first two problems, we derive converse results on the Bayes risk and the computation time, respectively. For the last problem, we first study the relationship between the generalization capability of a learning algorithm and its stability property measured by the mutual information between its input and output, and then derive achievability results on the generalization error of adaptively composed learning algorithms. In all cases, we obtain general results on the fundamental limits with respect to a general model of the problem, so that the results can be applied to various specific scenarios. Our information-theoretic analyses also provide general approaches to inferring global properties of a distributed information processing system from local properties of its components

    Mean Estimation from One-Bit Measurements

    Full text link
    We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release nn bits given a sample of size nn, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that \emph{one} round of adaptivity is sufficient to achieve optimal mean-square error
    • …
    corecore