Configuration tuning for large software systems is generally challenging due
to the complex configuration space and expensive performance evaluation. Most
existing approaches follow a two-phase process, first learning a
regression-based performance prediction model on available samples and then
searching for the configurations with satisfactory performance using the
learned model. Such regression-based models often suffer from the scarcity of
samples due to the enormous time and resources required to run a large software
system with a specific configuration. Moreover, previous studies have shown
that even a highly accurate regression-based model may fail to discern the
relative merit between two configurations, whereas performance comparison is
actually one fundamental strategy for configuration tuning. To address these
issues, this paper proposes CM-CASL, a Comparison-based performance Modeling
approach for software systems via Collaborative Active and Semisupervised
Learning. CM-CASL learns a classification model that compares the performance
of two given configurations, and enhances the samples through a collaborative
labeling process by both human experts and classifiers using an integration of
active and semisupervised learning. Experimental results demonstrate that
CM-CASL outperforms two state-of-the-art performance modeling approaches in
terms of both classification accuracy and rank accuracy, and thus provides a
better performance model for the subsequent work of configuration tuning