This paper addresses the general problem of domain adaptation which arises in
a variety of applications where the distribution of the labeled sample
available somewhat differs from that of the test data. Building on previous
work by Ben-David et al. (2007), we introduce a novel distance between
distributions, discrepancy distance, that is tailored to adaptation problems
with arbitrary loss functions. We give Rademacher complexity bounds for
estimating the discrepancy distance from finite samples for different loss
functions. Using this distance, we derive novel generalization bounds for
domain adaptation for a wide family of loss functions. We also present a series
of novel adaptation bounds for large classes of regularization-based
algorithms, including support vector machines and kernel ridge regression based
on the empirical discrepancy. This motivates our analysis of the problem of
minimizing the empirical discrepancy for various loss functions for which we
also give novel algorithms. We report the results of preliminary experiments
that demonstrate the benefits of our discrepancy minimization algorithms for
domain adaptation.Comment: 12 pages, 4 figure