The Rash\=omon effect poses challenges for deriving reliable knowledge from
machine learning models. This study examined the influence of sample size on
explanations from models in a Rash\=omon set using SHAP. Experiments on 5
public datasets showed that explanations gradually converged as the sample size
increased. Explanations from <128 samples exhibited high variability, limiting
reliable knowledge extraction. However, agreement between models improved with
more data, allowing for consensus. Bagging ensembles often had higher
agreement. The results provide guidance on sufficient data to trust
explanations. Variability at low samples suggests that conclusions may be
unreliable without validation. Further work is needed with more model types,
data domains, and explanation methods. Testing convergence in neural networks
and with model-specific explanation methods would be impactful. The approaches
explored here point towards principled techniques for eliciting knowledge from
ambiguous models.Comment: 13 pages, 6 figures and 6 table