1,456 research outputs found

    Towards a Rigorous Evaluation of XAI Methods on Time Series

    Full text link
    Explainable Artificial Intelligence (XAI) methods are typically deployed to explain and debug black-box machine learning models. However, most proposed XAI methods are black-boxes themselves and designed for images. Thus, they rely on visual interpretability to evaluate and prove explanations. In this work, we apply XAI methods previously used in the image and text-domain on time series. We present a methodology to test and evaluate various XAI methods on time series by introducing new verification techniques to incorporate the temporal dimension. We further conduct preliminary experiments to assess the quality of selected XAI method explanations with various verification methods on a range of datasets and inspecting quality metrics on it. We demonstrate that in our initial experiments, SHAP works robust for all models, but others like DeepLIFT, LRP, and Saliency Maps work better with specific architectures.Comment: 5 Pages 1 Figure 1 Table 1 Page Reference - 2019 ICCV Workshop on Interpreting and Explaining Visual Artificial Intelligence Model

    Evaluating explanations of artificial intelligence decisions : the explanation quality rubric and survey

    Get PDF
    The use of Artificial Intelligence (AI) algorithms is growing rapidly (Vilone & Longo, 2020). With this comes an increasing demand for reliable, robust explanations of AI decisions. There is a pressing need for a way to evaluate their quality. This thesis examines these research questions: What would a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations look like? How can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be created? Can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be used to improve explanations? Current Explainable Artificial Intelligence (XAI) research lacks an accepted, widely employed method for evaluating AI explanations. This thesis offers a method for creating a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations. It uses this to create an evaluation methodology, the XQ Rubric and XQ Survey. The XQ Rubric and Survey are then employed to improve explanations of AI decisions. The thesis asks what constitutes a good explanation in the context of XAI. It provides: 1. a model of good explanation for use in XAI research 2. a method of gathering non-expert evaluations of XAI explanations 3. an evaluation scheme for non-experts to employ in assessing XAI explanations (XQ Rubric and XQ Survey). The thesis begins with a literature review, primarily an exploration of previous attempts to evaluate XAI explanations formally. This is followed by an account of the development and iterative refinement of a solution to the problem, the eXplanation Quality Rubric (XQ Rubric). A Design Science methodology was used to guide the XQ Rubric and XQ Survey development. The thesis limits itself to XAI explanations appropriate for non-experts. It proposes and tests an evaluation rubric and survey method that is both stable and robust: that is, readily usable and consistently reliable in a variety of XAI-explanation tasks.Doctor of Philosoph

    On the Soundness of XAI in Prognostics and Health Management (PHM)

    Full text link
    The aim of Predictive Maintenance, within the field of Prognostics and Health Management (PHM), is to identify and anticipate potential issues in the equipment before these become critical. The main challenge to be addressed is to assess the amount of time a piece of equipment will function effectively before it fails, which is known as Remaining Useful Life (RUL). Deep Learning (DL) models, such as Deep Convolutional Neural Networks (DCNN) and Long Short-Term Memory (LSTM) networks, have been widely adopted to address the task, with great success. However, it is well known that this kind of black box models are opaque decision systems, and it may be hard to explain its outputs to stakeholders (experts in the industrial equipment). Due to the large number of parameters that determine the behavior of these complex models, understanding the reasoning behind the predictions is challenging. This work presents a critical and comparative revision on a number of XAI methods applied on time series regression model for PM. The aim is to explore XAI methods within time series regression, which have been less studied than those for time series classification. The model used during the experimentation is a DCNN trained to predict the RUL of an aircraft engine. The methods are reviewed and compared using a set of metrics that quantifies a number of desirable properties that any XAI method should fulfill. The results show that GRAD-CAM is the most robust method, and that the best layer is not the bottom one, as is commonly seen within the context of Image Processing

    Explainable NILM Networks

    Get PDF
    There has been an explosion in the literature recently on Nonintrusive load monitoring (NILM) approaches based on neural networks and other advanced machine learning methods. However, though these methods provide competitive accuracy, the inner workings of these models is less clear. Understanding the outputs of the networks help in improving the designs, highlights the relevant features and aspects of the data used for making the decision, provides a better picture of the accuracy of the models (since a single accuracy number is often insufficient), and also inherently provides a level of trust in the value of the provided consumption feedback to the NILM end-user. Explainable Artificial Intelligence (XAI) aims to address this issue by explaining these “black-boxes”. XAI methods, developed for image and text-based methods, can in many cases interpret well the outputs of complex models, making them transparent. However, explaining time-series data inference remains a challenge. In this paper, we show how some XAI-based approaches can be used to explain NILM deep learning-based autoencoders inner workings, and examine why the network performs or does not perform well in certain cases
    • …
    corecore