We highlight the utility of a certain property of model training: instead of
drawing training data from the same distribution as test data, learning a
different training distribution often improves accuracy, especially at small
model sizes. This provides a way to build accurate small models, which are
attractive for interpretability and resource-constrained environments. Here we
empirically show that this principle is both general and effective: it may be
used across tasks/model families, and it can augment prediction accuracy of
traditional models to the extent they are competitive with specialized
techniques. The tasks we consider are explainable clustering and
prototype-based classification. We also look at Random Forests to illustrate
how this principle may be applied to accommodate multiple size constraints,
e.g., number of trees and maximum depth per tree. Results using multiple
datasets are presented and are shown to be statistically significant