In credit scoring, machine learning models are known to outperform standard
parametric models. As they condition access to credit, banking supervisors and
internal model validation teams need to monitor their predictive performance
and to identify the features with the highest impact on performance. To
facilitate this, we introduce the XPER methodology to decompose a performance
metric (e.g., AUC, R2) into specific contributions associated with the
various features of a classification or regression model. XPER is theoretically
grounded on Shapley values and is both model-agnostic and performance
metric-agnostic. Furthermore, it can be implemented either at the model level
or at the individual level. Using a novel dataset of car loans, we decompose
the AUC of a machine-learning model trained to forecast the default probability
of loan applicants. We show that a small number of features can explain a
surprisingly large part of the model performance. Furthermore, we find that the
features that contribute the most to the predictive performance of the model
may not be the ones that contribute the most to individual forecasts (SHAP). We
also show how XPER can be used to deal with heterogeneity issues and
significantly boost out-of-sample performance