Search CORE

1,009 research outputs found

Contrastive Visual Explanations for Reinforcement Learning via Counterfactual Rewards

Author: Liu Weiru
Liu Xiaowei
McAreavey Kevin
Publication venue: Springer
Publication date: 21/10/2023
Field of study

Explainability for Large Language Models: A Survey

Author: Cai Hengyi
Chen Hanjie
Deng Huiqi
Du Mengnan
Liu Ninghao
Wang Shuaiqiang
Yang Fan
Yin Dawei
Zhao Haiyan
Publication venue
Publication date: 02/09/2023
Field of study

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models

arXiv.org e-Print Archive

Physics-Inspired Interpretability Of Machine Learning Models

Author: Niroomand Maximilian P
Wales David J
Publication venue
Publication date: 05/04/2023
Field of study

The ability to explain decisions made by machine learning models remains one of the most significant hurdles towards widespread adoption of AI in highly sensitive areas such as medicine, cybersecurity or autonomous driving. Great interest exists in understanding which features of the input data prompt model decision making. In this contribution, we propose a novel approach to identify relevant features of the input data, inspired by methods from the energy landscapes field, developed in the physical sciences. By identifying conserved weights within groups of minima of the loss landscapes, we can identify the drivers of model decision making. Analogues to this idea exist in the molecular sciences, where coordinate invariants or order parameters are employed to identify critical features of a molecule. However, no such approach exists for machine learning loss landscapes. We will demonstrate the applicability of energy landscape methods to machine learning models and give examples, both synthetic and from the real world, for how these methods can help to make models more interpretable.Comment: 6 pages, 2 figures, ICLR 2023 Workshop on Physics for Machine Learnin

arXiv.org e-Print Archive

Recommended from our members

Interpretable Deep Learning: Beyond Feature-Importance with Concept-based Explanations

Author: Dimanov Botty
Publication venue: University of Cambridge
Publication date: 30/12/2020
Field of study

Deep Neural Network (DNN) models are challenging to interpret because of their highly complex and non-linear nature. This lack of interpretability (1) inhibits adoption within safety critical applications, (2) makes it challenging to debug existing models, and (3) prevents us from extracting valuable knowledge. Explainable AI (XAI) research aims to increase the transparency of DNN model behaviour to improve interpretability. Feature importance explanations are the most popular interpretability approaches. They show the importance of each input feature (e.g., pixel, patch, word vector) to the model’s prediction. However, we hypothesise that feature importance explanations have two main shortcomings concerning their inability to describe the complexity of a DNN behaviour with sufficient (1) fidelity and (2) richness. Fidelity and richness are essential because different tasks, users, and data types require specific levels of trust and understanding. The goal of this thesis is to showcase the shortcomings of feature importance explanations and to develop explanation techniques that describe the DNN behaviour with greater richness. We design an adversarial explanation attack to highlight the infidelity and inadequacy of feature importance explanations. Our attack modifies the parameters of a pre-trained model. It uses fairness as a proxy measure for the fidelity of an explanation method to demonstrate that the apparent importance of a feature does not reveal anything reliable about the fairness of a model. Hence, regulators or auditors should not rely on feature importance explanations to measure or enforce standards of fairness. As one solution, we formulate five different levels of the semantic richness of explanations to evaluate explanations and propose two function decomposition frameworks (DGINN and CME) to extract explanations from DNNs at a semantically higher level than feature importance explanations. Concept-based approaches provide explanations in terms of atomic human-understandable units (e.g., wheel or door) rather than individual raw features (e.g., pixels or characters). Our function decomposition frameworks can extract specific class representations from 5% of the network parameters and concept representations with an average-per-concept F1 score of 86%. Finally, the CME framework makes it possible to compare concept-based explanations, contributing to the scientific rigour of evaluating interpretability methods.The author would like to appreciate the generous sponsorship of the Engineering and Physical Sciences Research Council (EPSRC), The Department of Computer Science and Technology at the University of Cambridge, and Tenyks, Inc

Apollo (Cambridge)

ATTITUDE AGREEMENT, TASK COMPETENCE, INFORMATION SEARCH AND THE CHOICE OF WORK PARTNERS IN THE ATTRACTION-SIMILARITY PARADIGM

Author: CARSRUD ALAN LEE
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/1974
Field of study

UNH Scholars' Repository

Active Inference in Simulated Cortical Circuits

Author: Cullen Maell
Publication venue
Publication date: 02/12/2021
Field of study

Explore Bristol Research

Deep Interpretability Methods for Neuroimaging

Author: Rahman Md Mahfuzur
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/12/2022
Field of study

Brain dynamics are highly complex and yet hold the key to understanding brain function and dysfunction. The dynamics captured by resting-state functional magnetic resonance imaging data are noisy, high-dimensional, and not readily interpretable. The typical approach of reducing this data to low-dimensional features and focusing on the most predictive features comes with strong assumptions and can miss essential aspects of the underlying dynamics. In contrast, introspection of discriminatively trained deep learning models may uncover disorder-relevant elements of the signal at the level of individual time points and spatial locations. Nevertheless, the difficulty of reliable training on high-dimensional but small-sample datasets and the unclear relevance of the resulting predictive markers prevent the widespread use of deep learning in functional neuroimaging. In this dissertation, we address these challenges by proposing a deep learning framework to learn from high-dimensional dynamical data while maintaining stable, ecologically valid interpretations. The developed model is pre-trainable and alleviates the need to collect an enormous amount of neuroimaging samples to achieve optimal training. We also provide a quantitative validation module, Retain and Retrain (RAR), that can objectively verify the higher predictability of the dynamics learned by the model. Results successfully demonstrate that the proposed framework enables learning the fMRI dynamics directly from small data and capturing compact, stable interpretations of features predictive of function and dysfunction. We also comprehensively reviewed deep interpretability literature in the neuroimaging domain. Our analysis reveals the ongoing trend of interpretability practices in neuroimaging studies and identifies the gaps that should be addressed for effective human-machine collaboration in this domain. This dissertation also proposed a post hoc interpretability method, Geometrically Guided Integrated Gradients (GGIG), that leverages geometric properties of the functional space as learned by a deep learning model. With extensive experiments and quantitative validation on MNIST and ImageNet datasets, we demonstrate that GGIG outperforms integrated gradients (IG), which is considered to be a popular interpretability method in the literature. As GGIG is able to identify the contours of the discriminative regions in the input space, GGIG may be useful in various medical imaging tasks where fine-grained localization as an explanation is beneficial

ScholarWorks @ Georgia State University

Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey

Author: Mishra Pruthwik
Mishra Rahul
Roy Tathagato
Urlana Ashok
Publication venue
Publication date: 15/11/2023
Field of study

Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. While a growing corpus of research is devoted towards a more controllable summarization, there is no comprehensive survey available that thoroughly explores the diverse controllable aspects or attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable aspects according to their shared characteristics and objectives, and present a thorough examination of existing methods and datasets within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also delving into potential solutions and future directions for CTS.Comment: 19 pages, 1 figur

arXiv.org e-Print Archive