Shapley value is originally a concept in econometrics to fairly distribute
both gains and costs to players in a coalition game. In the recent decades, its
application has been extended to other areas such as marketing, engineering and
machine learning. For example, it produces reasonable solutions for problems in
sensitivity analysis, local model explanation towards the interpretable machine
learning, node importance in social network, attribution models, etc. However,
its heavy computational burden has been long recognized but rarely
investigated. Specifically, in a d-player coalition game, calculating a
Shapley value requires the evaluation of d! or 2d marginal contribution
values, depending on whether we are taking the permutation or combination
formulation of the Shapley value. Hence it becomes infeasible to calculate the
Shapley value when d is reasonably large. A common remedy is to take a random
sample of the permutations to surrogate for the complete list of permutations.
We find an advanced sampling scheme can be designed to yield much more accurate
estimation of the Shapley value than the simple random sampling (SRS). Our
sampling scheme is based on combinatorial structures in the field of design of
experiments (DOE), particularly the order-of-addition experimental designs for
the study of how the orderings of components would affect the output. We show
that the obtained estimates are unbiased, and can sometimes deterministically
recover the original Shapley value. Both theoretical and simulations results
show that our DOE-based sampling scheme outperforms SRS in terms of estimation
accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data
analysis is conducted for the C. elegans nervous system and the 9/11 terrorist
network