43 research outputs found
Occlusion Sensitivity Analysis with Augmentation Subspace Perturbation in Deep Feature Space
Deep Learning of neural networks has gained prominence in multiple
life-critical applications like medical diagnoses and autonomous vehicle
accident investigations. However, concerns about model transparency and biases
persist. Explainable methods are viewed as the solution to address these
challenges. In this study, we introduce the Occlusion Sensitivity Analysis with
Deep Feature Augmentation Subspace (OSA-DAS), a novel perturbation-based
interpretability approach for computer vision. While traditional perturbation
methods make only use of occlusions to explain the model predictions, OSA-DAS
extends standard occlusion sensitivity analysis by enabling the integration
with diverse image augmentations. Distinctly, our method utilizes the output
vector of a DNN to build low-dimensional subspaces within the deep feature
vector space, offering a more precise explanation of the model prediction. The
structural similarity between these subspaces encompasses the influence of
diverse augmentations and occlusions. We test extensively on the ImageNet-1k,
and our class- and model-agnostic approach outperforms commonly used
interpreters, setting it apart in the realm of explainable AI.Comment: Accepted at WACV 202
Adaptive occlusion sensitivity analysis for visually explaining video recognition networks
This paper proposes a method for visually explaining the decision-making
process of video recognition networks with a temporal extension of occlusion
sensitivity analysis, called Adaptive Occlusion Sensitivity Analysis (AOSA).
The key idea here is to occlude a specific volume of data by a 3D mask in an
input 3D temporal-spatial data space and then measure the change degree in the
output score. The occluded volume data that produces a larger change degree is
regarded as a more critical element for classification. However, while the
occlusion sensitivity analysis is commonly used to analyze single image
classification, applying this idea to video classification is not so
straightforward as a simple fixed cuboid cannot deal with complicated motions.
To solve this issue, we adaptively set the shape of a 3D occlusion mask while
referring to motions. Our flexible mask adaptation is performed by considering
the temporal continuity and spatial co-occurrence of the optical flows
extracted from the input video data. We further propose a novel method to
reduce the computational cost of the proposed method with the first-order
approximation of the output score with respect to an input video. We
demonstrate the effectiveness of our method through various and extensive
comparisons with the conventional methods in terms of the deletion/insertion
metric and the pointing metric on the UCF101 dataset and the Kinetics-400 and
700 datasets.Comment: 11 page