18 research outputs found
A Baseline for Shapley Values in MLPs: from Missingness to Neutrality
Being able to explain a prediction as well as having a model that performs
well are paramount in many machine learning applications. Deep neural networks
have gained momentum recently on the basis of their accuracy, however these are
often criticised to be black-boxes. Many authors have focused on proposing
methods to explain their predictions. Among these explainability methods,
feature attribution methods have been favoured for their strong theoretical
foundation: the Shapley value. A limitation of Shapley value is the need to
define a baseline (aka reference point) representing the missingness of a
feature. In this paper, we present a method to choose a baseline based on a
neutrality value: a parameter defined by decision makers at which their choices
are determined by the returned value of the model being either below or above
it. Based on this concept, we theoretically justify these neutral baselines and
find a way to identify them for MLPs. Then, we experimentally demonstrate that
for a binary classification task, using a synthetic dataset and a dataset
coming from the financial domain, the proposed baselines outperform, in terms
of local explanability power, standard ways of choosing them
Interpreting Multivariate Shapley Interactions in DNNs
This paper aims to explain deep neural networks (DNNs) from the perspective
of multivariate interactions. In this paper, we define and quantify the
significance of interactions among multiple input variables of the DNN. Input
variables with strong interactions usually form a coalition and reflect
prototype features, which are memorized and used by the DNN for inference. We
define the significance of interactions based on the Shapley value, which is
designed to assign the attribution value of each input variable to the
inference. We have conducted experiments with various DNNs. Experimental
results have demonstrated the effectiveness of the proposed method
Interpreting and Disentangling Feature Components of Various Complexity from DNNs
This paper aims to define, quantify, and analyze the feature complexity that
is learned by a DNN. We propose a generic definition for the feature
complexity. Given the feature of a certain layer in the DNN, our method
disentangles feature components of different complexity orders from the
feature. We further design a set of metrics to evaluate the reliability, the
effectiveness, and the significance of over-fitting of these feature
components. Furthermore, we successfully discover a close relationship between
the feature complexity and the performance of DNNs. As a generic mathematical
tool, the feature complexity and the proposed metrics can also be used to
analyze the success of network compression and knowledge distillation