8 research outputs found

    Rationalization for Explainable NLP: A Survey

    Get PDF
    Recent advances in deep learning have improved the performance of many Natural Language Processing (NLP) tasks such as translation, question-answering, and text classification. However, this improvement comes at the expense of model explainability. Black-box models make it difficult to understand the internals of a system and the process it takes to arrive at an output. Numerical (LIME, Shapley) and visualization (saliency heatmap) explainability techniques are helpful; however, they are insufficient because they require specialized knowledge. These factors led rationalization to emerge as a more accessible explainable technique in NLP. Rationalization justifies a model's output by providing a natural language explanation (rationale). Recent improvements in natural language generation have made rationalization an attractive technique because it is intuitive, human-comprehensible, and accessible to non-technical users. Since rationalization is a relatively new field, it is disorganized. As the first survey, rationalization literature in NLP from 2007-2022 is analyzed. This survey presents available methods, explainable evaluations, code, and datasets used across various NLP tasks that use rationalization. Further, a new subfield in Explainable AI (XAI), namely, Rational AI (RAI), is introduced to advance the current state of rationalization. A discussion on observed insights, challenges, and future directions is provided to point to promising research opportunities

    Deep Neural Networks Explainability: Algorithms and Applications

    Get PDF
    Deep neural networks (DNNs) are progressing at an astounding rate, and these models have a wide range of real-world applications, such as movie recommendations of Netflix, neural machine translation of Google, speech recognition of Amazon Alexa. Despite the successes, DNNs have their own limitations and drawbacks. The most significant one is the lack of transparency behind their behaviors, which leaves users with little understanding of how particular decisions are made by these models. Consider, for instance, an advanced self-driving car equipped with various DNN algorithms doesn't brake or decelerate when confronting a stopped firetruck. This unexpected behavior may frustrate and confuse users, making them wonder why. Even worse, the wrong decisions could cause severe consequences if the car is driving at highway speeds and might finally crash the firetruck. The concerns about the black-box nature of complex deep neural network models have hampered their further applications in our society, especially in those critical decision-making domains like self-driving cars. In this dissertation, we investigate the following three research questions: How can we provide explanations for pre-trained DNN models so as to provide insights into their decision making process? How can we make use of explanations to enhance the generalization ability of DNN models? And how can we employ explanations to promote the fairness of DNN models? To address the first research question, we explore the explainability of two standard DNN architectures, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation for CNN models. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of CNN models. By further interacting with the neuron of the target category at the output layer of the CNN, we enforce the interpretation result to be class-discriminative. Besides, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into the additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results over a series of image and text classification benchmarks demonstrate the faithfulness and interpretability of the proposed two explanation methods. To address the second research question, we make use of explainability as a debugging tool to examine the vulnerability and failure reasons of DNNs, which further lead to insights that can be used to enhance the generalization ability of DNN models. We propose CREX, which encourages DNN models to focus more on evidence that actually matters for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Besides, recent studies indicate that BERT-based natural language understanding models are prone to rely on shortcut features for prediction. Explainability based observations are employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LTGR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental analysis over several text benchmark datasets validate that our CREX and LTGR framework could effectively increase the generalization ability of DNN models. In terms of the third research question, explainability based analysis indicates that DNN models trained with standard cross entropy loss tend to capture the spurious correlation between fairness sensitive information in encoder representations with specific class labels. We propose a new mitigation technique, namely RNF, that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance

    Extracting and Harnessing Interpretation in Data Mining

    Get PDF
    Machine learning, especially the recent deep learning technique, has aroused significant development to various data mining applications, including recommender systems, misinformation detection, outlier detection, and health informatics. Unfortunately, while complex models have achieved unprecedented prediction capability, they are often criticized as ``black boxes'' due to multiple layers of non-linear transformation and the hardly understandable working mechanism. To tackle the opacity issue, interpretable machine learning has attracted increasing attentions. Traditional interpretation methods mainly focus on explaining predictions of classification models with gradient based methods or local approximation methods. However, the natural characteristics of data mining applications are not considered, and the internal mechanisms of models are not fully explored. Meanwhile, it is unknown how to utilize interpretation to improve models. To bridge the gap, I developed a series of interpretation methods that gradually increase the transparency of data mining models. First, a fundamental goal of interpretation is providing the attribution of input features to model outputs. To adapt feature attribution to explaining outlier detection, I propose Contextual Outlier Interpretation (COIN). Second, to overcome the limitation of attribution methods that do not explain internal information inside models, I further propose representation interpretation methods to extract knowledge as a taxonomy. However, these post-hoc methods may suffer from interpretation accuracy and the inability to directly control model training process. Therefore, I propose an interpretable network embedding framework to explicitly control the meaning of latent dimensions. Finally, besides obtaining explanation, I propose to use interpretation to discover the vulnerability of models in adversarial circumstances, and then actively prepare models using adversarial training to improve their robustness against potential threats. My research of interpretable machine learning enables data scientists to better understand their models and discover defects for further improvement, as well as improves the experiences of customers who benefit from data mining systems. It broadly impacts fields such as Information Retrieval, Information Security, Social Computing, and Health Informatics

    A Multidisciplinary Design and Evaluation Framework for Explainable AI Systems

    Get PDF
    Nowadays, algorithms analyze user data and affect the decision-making process for millions of people on matters like employment, insurance and loan rates, and even criminal justice. However, these algorithms that serve critical roles in many industries have their own biases that can result in discrimination and unfair decision-making. Explainable Artificial Intelligence (XAI) systems can be a solution to predictable and accountable AI by explaining AI decision-making processes for end users and therefore increase user awareness and prevent bias and discrimination. The broad spectrum of research on XAI, including designing interpretable models, explainable user interfaces, and human-subject studies of XAI systems are sought in different disciplines such as machine learning, human-computer interactions (HCI), and visual analytics. The mismatch in objectives for the scholars to define, design, and evaluate the concept of XAI may slow down the overall advances of end-to-end XAI systems. My research aims to converge knowledge behind design and evaluation of XAI systems between multiple disciplines to further support key benefits of algorithmic transparency and interpretability. To this end, I propose a comprehensive design and evaluation framework for XAI systems with step-by-step guidelines to pair different design goals with their evaluation methods for iterative system design cycles in multidisciplinary teams. This dissertation presents a comprehensive XAI design and evaluation framework to provide guidance for different design goals and evaluation approaches in XAI systems. After a thorough review of XAI research in the fields of machine learning, visualization, and HCI, I present a categorization of XAI design goals and evaluation methods and show a mapping between design goals for different XAI user groups and their evaluation methods. From my findings, I present a design and evaluation framework for XAI systems (Objective 1) to address the relation between different system design needs. The framework provides recommendations for different goals and ready-to-use tables of evaluation methods for XAI systems. The importance of this framework is in providing guidance for researchers on different aspects of XAI system design in multidisciplinary team efforts. Then, I demonstrate and validate the proposed framework (Objective 2) through one end-to-end XAI system case study and two examples by analysis of previous XAI systems in terms of our framework. I present two contributions to my XAI design and evaluation framework to improve evaluation methods for XAI system
    corecore