10,619 research outputs found
Generalization Bounds via Information Density and Conditional Information Density
We present a general approach, based on an exponential inequality, to derive
bounds on the generalization error of randomized learning algorithms. Using
this approach, we provide bounds on the average generalization error as well as
bounds on its tail probability, for both the PAC-Bayesian and single-draw
scenarios. Specifically, for the case of subgaussian loss functions, we obtain
novel bounds that depend on the information density between the training data
and the output hypothesis. When suitably weakened, these bounds recover many of
the information-theoretic available bounds in the literature. We also extend
the proposed exponential-inequality approach to the setting recently introduced
by Steinke and Zakynthinou (2020), where the learning algorithm depends on a
randomly selected subset of the available training data. For this setup, we
present bounds for bounded loss functions in terms of the conditional
information density between the output hypothesis and the random variable
determining the subset choice, given all training data. Through our approach,
we recover the average generalization bound presented by Steinke and
Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios.
For the single-draw scenario, we also obtain novel bounds in terms of the
conditional -mutual information and the conditional maximal leakage.Comment: Published in Journal on Selected Areas in Information Theory (JSAIT).
Important note: the proof of the data-dependent bounds provided in the paper
contains an error, which is rectified in the following document:
https://gdurisi.github.io/files/2021/jsait-correction.pd
Learning from networked examples
Many machine learning algorithms are based on the assumption that training
examples are drawn independently. However, this assumption does not hold
anymore when learning from a networked sample because two or more training
examples may share some common objects, and hence share the features of these
shared objects. We show that the classic approach of ignoring this problem
potentially can have a harmful effect on the accuracy of statistics, and then
consider alternatives. One of these is to only use independent examples,
discarding other information. However, this is clearly suboptimal. We analyze
sample error bounds in this networked setting, providing significantly improved
results. An important component of our approach is formed by efficient sample
weighting schemes, which leads to novel concentration inequalities
- …