114 research outputs found

    Towards making NLG a voice for interpretable Machine Learning

    Get PDF
    I would like to acknowledge the support given to me by the Engineering and Physical Sciences Research Council (EPSRC) DTP grant number EP/N509814/1.Publisher PD

    Explainable Hopfield Neural Networks Using an Automatic Video-Generation System

    Get PDF
    Hopfield Neural Networks (HNNs) are recurrent neural networks used to implement associative memory. They can be applied to pattern recognition, optimization, or image segmentation. However, sometimes it is not easy to provide the users with good explanations about the results obtained with them due to mainly the large number of changes in the state of neurons (and their weights) produced during a problem of machine learning. There are currently limited techniques to visualize, verbalize, or abstract HNNs. This paper outlines how we can construct automatic video-generation systems to explain its execution. This work constitutes a novel approach to obtain explainable artificial intelligence systems in general and HNNs in particular building on the theory of data-to-text systems and software visualization approaches. We present a complete methodology to build these kinds of systems. Software architecture is also designed, implemented, and tested. Technical details about the implementation are also detailed and explained. We apply our approach to creating a complete explainer video about the execution of HNNs on a small recognition problem. Finally, several aspects of the videos generated are evaluated (quality, content, motivation and design/presentation).University of the Bio-Bio. Vicerrectoria de Investigacion. Facultad de Ciencias Empresariales. Departamento de Sistemas de Informacion

    Disentangling the Properties of Human Evaluation Methods:A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

    Get PDF
    Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable and can be expected to yield similar results when applied to the same system outputs. This has serious implications for reproducibility testing and meta-evaluation, in particular given that human evaluation is considered the gold standard against which the trustworthiness of automatic metrics is gauged. %and merging others, as well as deciding which evaluations should be able to reproduce each other’s results. Using examples from NLG, we propose a classification system for evaluations based on disentangling (i) what is being evaluated (which aspect of quality), and (ii) how it is evaluated in specific (a) evaluation modes and (b) experimental designs. We show that this approach provides a basis for determining comparability, hence for comparison of evaluations across papers, meta-evaluation experiments, reproducibility testing

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

    Full text link
    Deep Neural Networks (DNNs) have led to unprecedented progress in various natural language processing (NLP) tasks. Owing to limited data and computation resources, using third-party data and models has become a new paradigm for adapting various tasks. However, research shows that it has some potential security vulnerabilities because attackers can manipulate the training process and data source. Such a way can set specific triggers, making the model exhibit expected behaviors that have little inferior influence on the model's performance for primitive tasks, called backdoor attacks. Hence, it could have dire consequences, especially considering that the backdoor attack surfaces are broad. To get a precise grasp and understanding of this problem, a systematic and comprehensive review is required to confront various security challenges from different phases and attack purposes. Additionally, there is a dearth of analysis and comparison of the various emerging backdoor countermeasures in this situation. In this paper, we conduct a timely review of backdoor attacks and countermeasures to sound the red alarm for the NLP security community. According to the affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into three categorizations: attacking pre-trained model with fine-tuning (APMF) or prompt-tuning (APMP), and attacking final model with training (AFMT), where AFMT can be subdivided into different attack aims. Thus, attacks under each categorization are combed. The countermeasures are categorized into two general classes: sample inspection and model inspection. Overall, the research on the defense side is far behind the attack side, and there is no single defense that can prevent all types of backdoor attacks. An attacker can intelligently bypass existing defenses with a more invisible attack. ......Comment: 24 pages, 4 figure

    Evaluating explanations of artificial intelligence decisions : the explanation quality rubric and survey

    Get PDF
    The use of Artificial Intelligence (AI) algorithms is growing rapidly (Vilone & Longo, 2020). With this comes an increasing demand for reliable, robust explanations of AI decisions. There is a pressing need for a way to evaluate their quality. This thesis examines these research questions: What would a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations look like? How can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be created? Can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be used to improve explanations? Current Explainable Artificial Intelligence (XAI) research lacks an accepted, widely employed method for evaluating AI explanations. This thesis offers a method for creating a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations. It uses this to create an evaluation methodology, the XQ Rubric and XQ Survey. The XQ Rubric and Survey are then employed to improve explanations of AI decisions. The thesis asks what constitutes a good explanation in the context of XAI. It provides: 1. a model of good explanation for use in XAI research 2. a method of gathering non-expert evaluations of XAI explanations 3. an evaluation scheme for non-experts to employ in assessing XAI explanations (XQ Rubric and XQ Survey). The thesis begins with a literature review, primarily an exploration of previous attempts to evaluate XAI explanations formally. This is followed by an account of the development and iterative refinement of a solution to the problem, the eXplanation Quality Rubric (XQ Rubric). A Design Science methodology was used to guide the XQ Rubric and XQ Survey development. The thesis limits itself to XAI explanations appropriate for non-experts. It proposes and tests an evaluation rubric and survey method that is both stable and robust: that is, readily usable and consistently reliable in a variety of XAI-explanation tasks.Doctor of Philosoph
    • …
    corecore