Transfer learning, or domain adaptation, is concerned with machine learning
problems in which training and testing data come from possibly different
distributions (denoted as μ and μ′, respectively). In this work, we
give an information-theoretic analysis on the generalization error and the
excess risk of transfer learning algorithms, following a line of work initiated
by Russo and Zhou. Our results suggest, perhaps as expected, that the
Kullback-Leibler (KL) divergence D(mu∣∣mu′) plays an important role in
characterizing the generalization error in the settings of domain adaptation.
Specifically, we provide generalization error upper bounds for general transfer
learning algorithms and extend the results to a specific empirical risk
minimization (ERM) algorithm where data from both distributions are available
in the training phase. We further apply the method to iterative, noisy gradient
descent algorithms, and obtain upper bounds which can be easily calculated,
only using parameters from the learning algorithms. A few illustrative examples
are provided to demonstrate the usefulness of the results. In particular, our
bound is tighter in specific classification problems than the bound derived
using Rademacher complexity.Comment: Accepted paper in 2020 IEEE International Symposium on Information
Theory (ISIT