937 research outputs found
On surrogate loss functions and -divergences
The goal of binary classification is to estimate a discriminant function
from observations of covariate vectors and corresponding binary
labels. We consider an elaboration of this problem in which the covariates are
not available directly but are transformed by a dimensionality-reducing
quantizer . We present conditions on loss functions such that empirical risk
minimization yields Bayes consistency when both the discriminant function and
the quantizer are estimated. These conditions are stated in terms of a general
correspondence between loss functions and a class of functionals known as
Ali-Silvey or -divergence functionals. Whereas this correspondence was
established by Blackwell [Proc. 2nd Berkeley Symp. Probab. Statist. 1 (1951)
93--102. Univ. California Press, Berkeley] for the 0--1 loss, we extend the
correspondence to the broader class of surrogate loss functions that play a key
role in the general theory of Bayes consistency for binary classification. Our
result makes it possible to pick out the (strict) subset of surrogate loss
functions that yield Bayes consistency for joint estimation of the discriminant
function and the quantizer.Comment: Published in at http://dx.doi.org/10.1214/08-AOS595 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Information-theoretic limitations of distributed information processing
In a generic distributed information processing system, a number of agents connected by communication channels aim to accomplish a task collectively through local communications. The fundamental limits of distributed information processing problems depend not only on the intrinsic difficulty of the task, but also on the communication constraints due to the distributedness. In this thesis, we reveal these dependencies quantitatively under information-theoretic frameworks.
We consider three typical distributed information processing problems: decentralized parameter estimation, distributed function computation, and statistical learning under adaptive composition. For the first two problems, we derive converse results on the Bayes risk and the computation time, respectively. For the last problem, we first study the relationship between the generalization capability of a learning algorithm and its stability property measured by the mutual information between its input and output, and then derive achievability results on the generalization error of adaptively composed learning algorithms. In all cases, we obtain general results on the fundamental limits with respect to a general model of the problem, so that the results can be applied to various specific scenarios. Our information-theoretic analyses also provide general approaches to inferring global properties of a distributed information processing system from local properties of its components
Mean Estimation from One-Bit Measurements
We consider the problem of estimating the mean of a symmetric log-concave
distribution under the constraint that only a single bit per sample from this
distribution is available to the estimator. We study the mean squared error as
a function of the sample size (and hence the number of bits). We consider three
settings: first, a centralized setting, where an encoder may release bits
given a sample of size , and for which there is no asymptotic penalty for
quantization; second, an adaptive setting in which each bit is a function of
the current observation and previously recorded bits, where we show that the
optimal relative efficiency compared to the sample mean is precisely the
efficiency of the median; lastly, we show that in a distributed setting where
each bit is only a function of a local sample, no estimator can achieve optimal
efficiency uniformly over the parameter space. We additionally complement our
results in the adaptive setting by showing that \emph{one} round of adaptivity
is sufficient to achieve optimal mean-square error
- …