Using Cross-Loss Influence Functions to Explain Deep Network
  Representations

Chopra, Rohit; Gombolay, Matthew; Silva, Andrew

Using Cross-Loss Influence Functions to Explain Deep Network Representations

Authors: Rohit Chopra
Matthew Gombolay
Andrew Silva
Publication date: 2 December 2020
Publisher

Abstract

As machine learning is increasingly deployed in the real world, it is ever more vital that we understand the decision-criteria of the models we train. Recently, researchers have shown that influence functions, a statistical measure of sample impact, may be extended to approximate the effects of training samples on classification accuracy for deep neural networks. However, prior work only applies to supervised learning setups where training and testing share an objective function. Despite the rise in unsupervised learning, self-supervised learning, and model pre-training, there are currently no suitable technologies for estimating influence of deep networks that do not train and test on the same objective. To overcome this limitation, we provide the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing settings. Our result enables us to compute the influence of unsupervised and self-supervised training examples with respect to a supervised test objective. We demonstrate this technique on a synthetic dataset as well as two Skip-gram language model examples to examine cluster membership and sources of unwanted bias

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2012.01685

Last time updated on 02/03/2021