A standard approach in pattern classification is to estimate the
distributions of the label classes, and then to apply the Bayes classifier to
the estimates of the distributions in order to classify unlabeled examples. As
one might expect, the better our estimates of the label class distributions,
the better the resulting classifier will be. In this paper we make this
observation precise by identifying risk bounds of a classifier in terms of the
quality of the estimates of the label class distributions. We show how PAC
learnability relates to estimates of the distributions that have a PAC
guarantee on their L1β distance from the true distribution, and we bound the
increase in negative log likelihood risk in terms of PAC bounds on the
KL-divergence. We give an inefficient but general-purpose smoothing method for
converting an estimated distribution that is good under the L1β metric into a
distribution that is good under the KL-divergence.Comment: 14 page