This paper introduces the MCML approach for empirically studying the
learnability of relational properties that can be expressed in the well-known
software design language Alloy. A key novelty of MCML is quantification of the
performance of and semantic differences among trained machine learning (ML)
models, specifically decision trees, with respect to entire (bounded) input
spaces, and not just for given training and test datasets (as is the common
practice). MCML reduces the quantification problems to the classic complexity
theory problem of model counting, and employs state-of-the-art model counters.
The results show that relatively simple ML models can achieve surprisingly high
performance (accuracy and F1-score) when evaluated in the common setting of
using training and test datasets - even when the training dataset is much
smaller than the test dataset - indicating the seeming simplicity of learning
relational properties. However, MCML metrics based on model counting show that
the performance can degrade substantially when tested against the entire
(bounded) input space, indicating the high complexity of precisely learning
these properties, and the usefulness of model counting in quantifying the true
performance