Search CORE

29,202 research outputs found

TSXplain: Demystification of DNN Decisions for Time-Series using Natural Language and Statistical Features

Author: Ahmed Sheraz
Dengel Andreas
Küsters Ferdinand
Mercier Dominique
Munir Mohsin
Siddiqui Shoaib Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/05/2019
Field of study

Neural networks (NN) are considered as black-boxes due to the lack of explainability and transparency of their decisions. This significantly hampers their deployment in environments where explainability is essential along with the accuracy of the system. Recently, significant efforts have been made for the interpretability of these deep networks with the aim to open up the black-box. However, most of these approaches are specifically developed for visual modalities. In addition, the interpretations provided by these systems require expert knowledge and understanding for intelligibility. This indicates a vital gap between the explainability provided by the systems and the novice user. To bridge this gap, we present a novel framework i.e. Time-Series eXplanation (TSXplain) system which produces a natural language based explanation of the decision taken by a NN. It uses the extracted statistical features to describe the decision of a NN, merging the deep learning world with that of statistics. The two-level explanation provides ample description of the decision made by the network to aid an expert as well as a novice user alike. Our survey and reliability assessment test confirm that the generated explanations are meaningful and correct. We believe that generating natural language based descriptions of the network's decisions is a big step towards opening up the black-box.Comment: Pre-prin

arXiv.org e-Print Archive

Explaining Transition Systems through Program Induction

Author: Penkov Svetlin
Ramamoorthy Subramanian
Publication venue
Publication date: 23/05/2017
Field of study

Explaining and reasoning about processes which underlie observed black-box phenomena enables the discovery of causal mechanisms, derivation of suitable abstract representations and the formulation of more robust predictions. We propose to learn high level functional programs in order to represent abstract models which capture the invariant structure in the observed data. We introduce the

\pi

-machine (program-induction machine) -- an architecture able to induce interpretable LISP-like programs from observed data traces. We propose an optimisation procedure for program learning based on backpropagation, gradient descent and A* search. We apply the proposed method to three problems: system identification of dynamical systems, explaining the behaviour of a DQN agent and learning by demonstration in a human-robot interaction scenario. Our experimental results show that the

\pi

-machine can efficiently induce interpretable programs from individual data traces.Comment: submitted to Neural Information Processing Systems 201

arXiv.org e-Print Archive

Towards Prediction Explainability through Sparse Communication

Author: Martins André F. T.
Treviso Marcos V.
Publication venue
Publication date: 28/04/2020
Field of study

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier's decision. We use this framework to compare several prior approaches for extracting explanations, including gradient methods, representation erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods at the light of classical feature selection, and we use this as inspiration to propose new embedded methods for explainability, through the use of selective, sparse attention. Experiments in text classification, natural language entailment, and machine translation, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods. Furthermore, human evaluation experiments show promising results with post-hoc explainers trained to optimize communication success and faithfulness

arXiv.org e-Print Archive

EXS: Explainable Search Using Local Model Agnostic Interpretability

Author: Anand Avishek
Singh Jaspreet
Publication venue
Publication date: 11/09/2018
Field of study

Retrieval models in information retrieval are used to rank documents for typically under-specified queries. Today machine learning is used to learn retrieval models from click logs and/or relevance judgments that maximizes an objective correlated with user satisfaction. As these models become increasingly powerful and sophisticated, they also become harder to understand. Consequently, it is hard for to identify artifacts in training, data specific biases and intents from a complex trained model like neural rankers even if trained purely on text features. EXS is a search system designed specifically to provide its users with insight into the following questions: `What is the intent of the query according to the ranker?', `Why is this document ranked higher than another?' and `Why is this document relevant to the query?'. EXS uses a version of a popular posthoc explanation method for classifiers -- LIME, adapted specifically to answer these questions. We show how such a system can effectively help a user understand the results of neural rankers and highlight areas of improvement

arXiv.org e-Print Archive

Game-Theoretic Interpretability for Temporal Modeling

Author: Alvarez-Melis David
Jaakkola Tommi S.
Lee Guang-He
Publication venue
Publication date: 30/06/2018
Field of study

Interpretability has arisen as a key desideratum of machine learning models alongside performance. Approaches so far have been primarily concerned with fixed dimensional inputs emphasizing feature relevance or selection. In contrast, we focus on temporal modeling and the problem of tailoring the predictor, functionally, towards an interpretable family. To this end, we propose a co-operative game between the predictor and an explainer without any a priori restrictions on the functional class of the predictor. The goal of the explainer is to highlight, locally, how well the predictor conforms to the chosen interpretable family of temporal models. Our co-operative game is setup asymmetrically in terms of information sets for efficiency reasons. We develop and illustrate the framework in the context of temporal sequence models with examples

arXiv.org e-Print Archive

Analysis Methods in Neural Language Processing: A Survey

Author: Belinkov Yonatan
Glass James
Publication venue
Publication date: 14/01/2019
Field of study

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.Comment: Version including the supplementary materials (3 tables), also available at https://boknilev.github.io/nlp-analysis-method

arXiv.org e-Print Archive

Explainable Machine Learning for Scientific Insights and Discoveries

Author: Bohn Bastian
Duarte Marco F.
Garcke Jochen
Roscher Ribana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/01/2020
Field of study

Machine learning methods have been remarkably successful for a wide range of application areas in the extraction of essential information from data. An exciting and relatively recent development is the uptake of machine learning in the natural sciences, where the major goal is to obtain novel scientific insights and discoveries from observational or simulated data. A prerequisite for obtaining a scientific outcome is domain knowledge, which is needed to gain explainability, but also to enhance scientific consistency. In this article we review explainable machine learning in view of applications in the natural sciences and discuss three core elements which we identified as relevant in this context: transparency, interpretability, and explainability. With respect to these core elements, we provide a survey of recent scientific works that incorporate machine learning and the way that explainable machine learning is used in combination with domain knowledge from the application areas

arXiv.org e-Print Archive

Reliable Decision Support using Counterfactual Models

Author: Saria Suchi
Schulam Peter
Publication venue
Publication date: 01/02/2018
Field of study

Decision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even dangerous. The key issue is that supervised learning algorithms are highly sensitive to the policy used to choose actions in the training data, which causes the model to capture relationships that do not generalize. We propose using a different learning objective that predicts counterfactuals instead of predicting outcomes under an existing action policy as in supervised learning. To support decision-making in temporal settings, we introduce the Counterfactual Gaussian Process (CGP) to predict the counterfactual future progression of continuous-time trajectories under sequences of future actions. We demonstrate the benefits of the CGP on two important decision-support tasks: risk prediction and "what if?" reasoning for individualized treatment planning.Comment: Published in the proceedings of Neural Information Processing Systems (NIPS) 201

arXiv.org e-Print Archive

Interpretable Deep Learning under Fire

Author: Ji Shouling
Luo Xiapu
Shen Hua
Wang Ningfei
Wang Ting
Zhang Xinyang
Publication venue
Publication date: 17/09/2019
Field of study

Providing explanations for deep neural network (DNN) models is crucial for their use in security-sensitive domains. A plethora of interpretation models have been proposed to help users understand the inner workings of DNNs: how does a DNN arrive at a specific decision for a given input? The improved interpretability is believed to offer a sense of security by involving human in the decision-making process. Yet, due to its data-driven nature, the interpretability itself is potentially susceptible to malicious manipulations, about which little is known thus far. Here we bridge this gap by conducting the first systematic study on the security of interpretable deep learning systems (IDLSes). We show that existing \imlses are highly vulnerable to adversarial manipulations. Specifically, we present ADV^2, a new class of attacks that generate adversarial inputs not only misleading target DNNs but also deceiving their coupled interpretation models. Through empirical evaluation against four major types of IDLSes on benchmark datasets and in security-critical applications (e.g., skin cancer diagnosis), we demonstrate that with ADV^2 the adversary is able to arbitrarily designate an input's prediction and interpretation. Further, with both analytical and empirical evidence, we identify the prediction-interpretation gap as one root cause of this vulnerability -- a DNN and its interpretation model are often misaligned, resulting in the possibility of exploiting both models simultaneously. Finally, we explore potential countermeasures against ADV^2, including leveraging its low transferability and incorporating it in an adversarial training framework. Our findings shed light on designing and operating IDLSes in a more secure and informative fashion, leading to several promising research directions.Comment: USENIX Security Symposium '2

arXiv.org e-Print Archive

Evaluating Explanation Methods for Neural Machine Translation

Author: Huang Guoping
Li Guanlin
Li Huayang
Li Jierui
Liu Lemao
Shi Shuming
Publication venue
Publication date: 04/05/2020
Field of study

Recently many efforts have been devoted to interpreting the black-box NMT models, but little progress has been made on metrics to evaluate explanation methods. Word Alignment Error Rate can be used as such a metric that matches human understanding, however, it can not measure explanation methods on those target words that are not aligned to any source word. This paper thereby makes an initial attempt to evaluate explanation methods from an alternative viewpoint. To this end, it proposes a principled metric based on fidelity in regard to the predictive behavior of the NMT model. As the exact computation for this metric is intractable, we employ an efficient approach as its approximation. On six standard translation tasks, we quantitatively evaluate several explanation methods in terms of the proposed metric and we reveal some valuable findings for these explanation methods in our experiments.Comment: Accepted to ACL 2020, 9 page

arXiv.org e-Print Archive