1,628 research outputs found
Towards a Rigorous Evaluation of XAI Methods on Time Series
Explainable Artificial Intelligence (XAI) methods are typically deployed to
explain and debug black-box machine learning models. However, most proposed XAI
methods are black-boxes themselves and designed for images. Thus, they rely on
visual interpretability to evaluate and prove explanations. In this work, we
apply XAI methods previously used in the image and text-domain on time series.
We present a methodology to test and evaluate various XAI methods on time
series by introducing new verification techniques to incorporate the temporal
dimension. We further conduct preliminary experiments to assess the quality of
selected XAI method explanations with various verification methods on a range
of datasets and inspecting quality metrics on it. We demonstrate that in our
initial experiments, SHAP works robust for all models, but others like
DeepLIFT, LRP, and Saliency Maps work better with specific architectures.Comment: 5 Pages 1 Figure 1 Table 1 Page Reference - 2019 ICCV Workshop on
Interpreting and Explaining Visual Artificial Intelligence Model
Evaluating explanations of artificial intelligence decisions : the explanation quality rubric and survey
The use of Artificial Intelligence (AI) algorithms is growing rapidly (Vilone & Longo, 2020). With this comes an increasing demand for reliable, robust explanations of AI decisions. There is a pressing need for a way to evaluate their quality. This thesis examines these research questions: What would a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations look like? How can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be created? Can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be used to improve explanations? Current Explainable Artificial Intelligence (XAI) research lacks an accepted, widely employed method for evaluating AI explanations. This thesis offers a method for creating a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations. It uses this to create an evaluation methodology, the XQ Rubric and XQ Survey. The XQ Rubric and Survey are then employed to improve explanations of AI decisions. The thesis asks what constitutes a good explanation in the context of XAI. It provides: 1. a model of good explanation for use in XAI research 2. a method of gathering non-expert evaluations of XAI explanations 3. an evaluation scheme for non-experts to employ in assessing XAI explanations (XQ Rubric and XQ Survey). The thesis begins with a literature review, primarily an exploration of previous attempts to evaluate XAI explanations formally. This is followed by an account of the development and iterative refinement of a solution to the problem, the eXplanation Quality Rubric (XQ Rubric). A Design Science methodology was used to guide the XQ Rubric and XQ Survey development. The thesis limits itself to XAI explanations appropriate for non-experts. It proposes and tests an evaluation rubric and survey method that is both stable and robust: that is, readily usable and consistently reliable in a variety of XAI-explanation tasks.Doctor of Philosoph
On the Soundness of XAI in Prognostics and Health Management (PHM)
The aim of Predictive Maintenance, within the field of Prognostics and Health
Management (PHM), is to identify and anticipate potential issues in the
equipment before these become critical. The main challenge to be addressed is
to assess the amount of time a piece of equipment will function effectively
before it fails, which is known as Remaining Useful Life (RUL). Deep Learning
(DL) models, such as Deep Convolutional Neural Networks (DCNN) and Long
Short-Term Memory (LSTM) networks, have been widely adopted to address the
task, with great success. However, it is well known that this kind of black box
models are opaque decision systems, and it may be hard to explain its outputs
to stakeholders (experts in the industrial equipment). Due to the large number
of parameters that determine the behavior of these complex models,
understanding the reasoning behind the predictions is challenging. This work
presents a critical and comparative revision on a number of XAI methods applied
on time series regression model for PM. The aim is to explore XAI methods
within time series regression, which have been less studied than those for time
series classification. The model used during the experimentation is a DCNN
trained to predict the RUL of an aircraft engine. The methods are reviewed and
compared using a set of metrics that quantifies a number of desirable
properties that any XAI method should fulfill. The results show that GRAD-CAM
is the most robust method, and that the best layer is not the bottom one, as is
commonly seen within the context of Image Processing
Explainable NILM Networks
There has been an explosion in the literature recently on Nonintrusive load monitoring (NILM) approaches based on neural networks and other advanced machine learning methods. However, though these methods provide competitive accuracy, the inner workings of these models is less clear. Understanding the outputs of the networks help in improving the designs, highlights the relevant features and aspects of the data used for making the decision, provides a better picture of the accuracy of the models (since a single accuracy number is often insufficient), and also inherently provides a level of trust in the value of the provided consumption feedback to the NILM end-user. Explainable Artificial Intelligence (XAI) aims to address this issue by explaining these “black-boxes”. XAI methods, developed for image and text-based methods, can in many cases interpret well the outputs of complex models, making them transparent. However, explaining time-series data inference remains a challenge. In this paper, we show how some XAI-based approaches can be used to explain NILM deep learning-based autoencoders inner workings, and examine why the network performs or does not perform well in certain cases
- …