287 research outputs found
Bregman Divergence Bounds and the Universality of the Logarithmic Loss
A loss function measures the discrepancy between the true values and their
estimated fits, for a given instance of data. In classification problems, a
loss function is said to be proper if the minimizer of the expected loss is the
true underlying probability. In this work we show that for binary
classification, the divergence associated with smooth, proper and convex loss
functions is bounded from above by the Kullback-Leibler (KL) divergence, up to
a normalization constant. It implies that by minimizing the log-loss
(associated with the KL divergence), we minimize an upper bound to any choice
of loss from this set. This property suggests that the log-loss is universal in
the sense that it provides performance guarantees to a broad class of accuracy
measures. Importantly, our notion of universality is not restricted to a
specific problem. This allows us to apply our results to many applications,
including predictive modeling, data clustering and sample complexity analysis.
Further, we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a normalization
constant). This result introduces a new set of divergence inequalities, similar
to Pinsker inequality, and extends well-known -divergence inequality
results.Comment: arXiv admin note: substantial text overlap with arXiv:1805.0380
Binary CEO Problem under Log-Loss with BSC Test-Channel Model
In this paper, we propose an efficient coding scheme for the two-link binary
Chief Executive Officer (CEO) problem under logarithmic loss criterion. The
exact rate-distortion bound for a two-link binary CEO problem under the
logarithmic loss has been obtained by Courtade and Weissman. We propose an
encoding scheme based on compound LDGM-LDPC codes to achieve the theoretical
bounds. In the proposed encoding, a binary quantizer using LDGM codes and a
syndrome-coding employing LDPC codes are applied. An iterative joint decoding
is also designed as a fusion center. The proposed CEO decoder is based on the
sum-product algorithm and a soft estimator.Comment: 5 pages. arXiv admin note: substantial text overlap with
arXiv:1801.0043
How to Achieve the Capacity of Asymmetric Channels
We survey coding techniques that enable reliable transmission at rates that
approach the capacity of an arbitrary discrete memoryless channel. In
particular, we take the point of view of modern coding theory and discuss how
recent advances in coding for symmetric channels help provide more efficient
solutions for the asymmetric case. We consider, in more detail, three basic
coding paradigms.
The first one is Gallager's scheme that consists of concatenating a linear
code with a non-linear mapping so that the input distribution can be
appropriately shaped. We explicitly show that both polar codes and spatially
coupled codes can be employed in this scenario. Furthermore, we derive a
scaling law between the gap to capacity, the cardinality of the input and
output alphabets, and the required size of the mapper.
The second one is an integrated scheme in which the code is used both for
source coding, in order to create codewords distributed according to the
capacity-achieving input distribution, and for channel coding, in order to
provide error protection. Such a technique has been recently introduced by
Honda and Yamamoto in the context of polar codes, and we show how to apply it
also to the design of sparse graph codes.
The third paradigm is based on an idea of B\"ocherer and Mathar, and
separates the two tasks of source coding and channel coding by a chaining
construction that binds together several codewords. We present conditions for
the source code and the channel code, and we describe how to combine any source
code with any channel code that fulfill those conditions, in order to provide
capacity-achieving schemes for asymmetric channels. In particular, we show that
polar codes, spatially coupled codes, and homophonic codes are suitable as
basic building blocks of the proposed coding strategy.Comment: 32 pages, 4 figures, presented in part at Allerton'14 and published
in IEEE Trans. Inform. Theor
Kolmogorov's Structure Functions and Model Selection
In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and
model selection. Let data be finite binary strings and models be finite sets of
binary strings. Consider model classes consisting of models of given maximal
(Kolmogorov) complexity. The ``structure function'' of the given data expresses
the relation between the complexity level constraint on a model class and the
least log-cardinality of a model in the class containing the data. We show that
the structure function determines all stochastic properties of the data: for
every constrained model class it determines the individual best-fitting model
in the class irrespective of whether the ``true'' model is in the model class
considered or not. In this setting, this happens {\em with certainty}, rather
than with high probability as is in the classical case. We precisely quantify
the goodness-of-fit of an individual model with respect to individual data. We
show that--within the obvious constraints--every graph is realized by the
structure function of some data. We determine the (un)computability properties
of the various functions contemplated and of the ``algorithmic minimal
sufficient statistic.''Comment: 25 pages LaTeX, 5 figures. In part in Proc 47th IEEE FOCS; this final
version (more explanations, cosmetic modifications) to appear in IEEE Trans
Inform T
- …