Accurate Small Models using Adaptive Sampling

Abstract

We highlight the utility of a certain property of model training: instead of drawing training data from the same distribution as test data, learning a different training distribution often improves accuracy, especially at small model sizes. This provides a way to build accurate small models, which are attractive for interpretability and resource-constrained environments. Here we empirically show that this principle is both general and effective: it may be used across tasks/model families, and it can augment prediction accuracy of traditional models to the extent they are competitive with specialized techniques. The tasks we consider are explainable clustering and prototype-based classification. We also look at Random Forests to illustrate how this principle may be applied to accommodate multiple size constraints, e.g., number of trees and maximum depth per tree. Results using multiple datasets are presented and are shown to be statistically significant

    Similar works

    Full text

    thumbnail-image

    Available Versions