41,799 research outputs found
GLIME: General, Stable and Local LIME Explanation
As black-box machine learning models grow in complexity and find applications
in high-stakes scenarios, it is imperative to provide explanations for their
predictions. Although Local Interpretable Model-agnostic Explanations (LIME)
[22] is a widely adpoted method for understanding model behaviors, it is
unstable with respect to random seeds [35,24,3] and exhibits low local fidelity
(i.e., how well the explanation approximates the model's local behaviors)
[21,16]. Our study shows that this instability problem stems from small sample
weights, leading to the dominance of regularization and slow convergence.
Additionally, LIME's sampling neighborhood is non-local and biased towards the
reference, resulting in poor local fidelity and sensitivity to reference
choice. To tackle these challenges, we introduce GLIME, an enhanced framework
extending LIME and unifying several prior methods. Within the GLIME framework,
we derive an equivalent formulation of LIME that achieves significantly faster
convergence and improved stability. By employing a local and unbiased sampling
distribution, GLIME generates explanations with higher local fidelity compared
to LIME. GLIME explanations are independent of reference choice. Moreover,
GLIME offers users the flexibility to choose a sampling distribution based on
their specific scenarios.Comment: Accepted by NeurIPS 2023 as a Spotlight pape
LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees
Systems based on artificial intelligence and machine learning models should
be transparent, in the sense of being capable of explaining their decisions to
gain humans' approval and trust. While there are a number of explainability
techniques that can be used to this end, many of them are only capable of
outputting a single one-size-fits-all explanation that simply cannot address
all of the explainees' diverse needs. In this work we introduce a
model-agnostic and post-hoc local explainability technique for black-box
predictions called LIMEtree, which employs surrogate multi-output regression
trees. We validate our algorithm on a deep neural network trained for object
detection in images and compare it against Local Interpretable Model-agnostic
Explanations (LIME). Our method comes with local fidelity guarantees and can
produce a range of diverse explanation types, including contrastive and
counterfactual explanations praised in the literature. Some of these
explanations can be interactively personalised to create bespoke, meaningful
and actionable insights into the model's behaviour. While other methods may
give an illusion of customisability by wrapping, otherwise static, explanations
in an interactive interface, our explanations are truly interactive, in the
sense of allowing the user to "interrogate" a black-box model. LIMEtree can
therefore produce consistent explanations on which an interactive exploratory
process can be built
CAFE: Conflict-Aware Feature-wise Explanations
Feature attribution methods are widely used to explain neural models by
determining the influence of individual input features on the models' outputs.
We propose a novel feature attribution method, CAFE (Conflict-Aware
Feature-wise Explanations), that addresses three limitations of the existing
methods: their disregard for the impact of conflicting features, their lack of
consideration for the influence of bias terms, and an overly high sensitivity
to local variations in the underpinning activation functions. Unlike other
methods, CAFE provides safeguards against overestimating the effects of neuron
inputs and separately traces positive and negative influences of input features
and biases, resulting in enhanced robustness and increased ability to surface
feature conflicts. We show experimentally that CAFE is better able to identify
conflicting features on synthetic tabular data and exhibits the best overall
fidelity on several real-world tabular datasets, while being highly
computationally efficient
- …