Statistical correction of input gradients for black box models trained with categorical input features

Koo, Peter K; Majdandzic, Antonio

Statistical correction of input gradients for black box models trained with categorical input features

Authors: Peter K Koo
Antonio Majdandzic
Publication date: 29 April 2022
Publisher: 'Cold Spring Harbor Laboratory'
Doi

Abstract

Gradients of a deep neural network’s predictions with respect to the inputs are used in a variety of downstream analyses, notably in post hoc explanations with feature attribution methods. For data with input features that live on a lower-dimensional manifold, we observe that the learned function can exhibit arbitrary behaviors off the manifold, where no data exists to anchor the function during training. This leads to a random component in the gradients which manifests as noise. We introduce a simple correction for this off-manifold gradient noise for the case of categorical input features, where input values are subject to a probabilistic simplex constraint, and demonstrate its effectiveness on regulatory genomics data. We find that our correction consistently leads to a significant improvement in gradient-based attribution scores

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

oai:zenodo.org:6506787

Last time updated on 02/12/2022

Cold Spring Harbor Laboratory Institutional Repository

oai:repository.cshl.edu:40623

Last time updated on 16/08/2022