33 research outputs found
On Network Science and Mutual Information for Explaining Deep Neural Networks
In this paper, we present a new approach to interpret deep learning models.
By coupling mutual information with network science, we explore how information
flows through feedforward networks. We show that efficiently approximating
mutual information allows us to create an information measure that quantifies
how much information flows between any two neurons of a deep learning model. To
that end, we propose NIF, Neural Information Flow, a technique for codifying
information flow that exposes deep learning model internals and provides
feature attributions.Comment: ICASSP 2020 (shorter version appeared at AAAI-19 Workshop on Network
Interpretability for Deep Learning
Recommended from our members
You shouldnât trust me: Learning models which conceal unfairness from multiple explanation methods.
Transparency of algorithmic systems is an important area of research, which has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME [23], even suggests that model expla- nations can answer the question âWhy should I trust you?â. Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explana- tion methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this ex- planation attack can mask a modelâs discriminatory use of a sensitive feature, raising strong concerns about using such explanation meth- ods to check fairness of a model
Algorithmic loafing and mitigation strategies in Human-AI teams
This research work was initiated under the Scottish Informatics & Computer Alliance (SICSA) Remote Collaboration Activities when the first author was working at the University of St Andrews, UK. We would like to thank the SICSA for the partial funding of the research work.Exercising social loafing â exerting minimal effort by an individual in a group setting â in human-machine teams could critically degrade performance, especially in high-stakes domains where human judgement is essential. Akin to social loafing in human interaction, algorithmic loafing may occur when humans mindlessly adhere to machine recommendations due to reluctance to engage analytically with AI recommendations and explanations. We consider how algorithmic loafing could emerge and how to mitigate it. Specifically, we posit that algorithmic loafing can be induced through repeated encounters with correct decisions from the AI and transparency may combat it. As a form of transparency, explanation is offered for reasons that include justification, control, and discovery. However, algorithmic loafing is further reinforced by the perceived competence that an explanation provides. In this work, we explored these ideas via human subject experiments (n = 239). We also study how improving decision transparency through validation by an external human approver affects performance. Using eight experimental conditions in a high-stakes criminal justice context, we find that decision accuracy is typically unaffected by multiple forms of transparency but there is a significant difference in performance when the machine errs. Participants who saw explanations alone are better at overriding incorrect decisions; however, those under induced algorithmic loafing exhibit poor performance with variation in decision time. We conclude with recommendations on curtailing algorithmic loafing and achieving social facilitation, where task visibility motivates individuals to perform better.Publisher PDFPeer reviewe
Perspectives on Incorporating Expert Feedback into Model Updates
Machine learning (ML) practitioners are increasingly tasked with developing
models that are aligned with non-technical experts' values and goals. However,
there has been insufficient consideration on how practitioners should translate
domain expertise into ML updates. In this paper, we consider how to capture
interactions between practitioners and experts systematically. We devise a
taxonomy to match expert feedback types with practitioner updates. A
practitioner may receive feedback from an expert at the observation- or
domain-level, and convert this feedback into updates to the dataset, loss
function, or parameter space. We review existing work from ML and
human-computer interaction to describe this feedback-update taxonomy, and
highlight the insufficient consideration given to incorporating feedback from
non-technical experts. We end with a set of open questions that naturally arise
from our proposed taxonomy and subsequent survey