Motivated by the emergence of decentralized machine learning ecosystems, we
study the delegation of data collection. Taking the field of contract theory as
our starting point, we design optimal and near-optimal contracts that deal with
two fundamental machine learning challenges: lack of certainty in the
assessment of model quality and lack of knowledge regarding the optimal
performance of any model. We show that lack of certainty can be dealt with via
simple linear contracts that achieve 1-1/e fraction of the first-best utility,
even if the principal has a small test set. Furthermore, we give sufficient
conditions on the size of the principal's test set that achieves a vanishing
additive approximation to the optimal utility. To address the lack of a priori
knowledge regarding the optimal performance, we give a convex program that can
adaptively and efficiently compute the optimal contract