We study the data-scaling of transfer learning from foundation models in the
low-downstream-data regime. We observe an intriguing phenomenon which we call
cliff-learning. Cliff-learning refers to regions of data-scaling laws where
performance improves at a faster than power law rate (i.e. regions of concavity
on a log-log scaling plot). We conduct an in-depth investigation of
foundation-model cliff-learning and study toy models of the phenomenon. We
observe that the degree of cliff-learning reflects the degree of compatibility
between the priors of a learning algorithm and the task being learned.Comment: 13 page