2 research outputs found
Understanding Deep Architectures by Visual Summaries
In deep learning, visualization techniques extract the salient patterns
exploited by deep networks for image classification, focusing on single images;
no effort has been spent in investigating whether these patterns are
systematically related to precise semantic entities over multiple images
belonging to a same class, thus failing to capture the very understanding of
the image class the network has realized. This paper goes in this direction,
presenting a visualization framework which produces a group of clusters or
summaries, each one formed by crisp salient image regions focusing on a
particular part that the network has exploited with high regularity to decide
for a given class. The approach is based on a sparse optimization step
providing sharp image saliency masks that are clustered together by means of a
semantic flow similarity measure. The summaries communicate clearly what a
network has exploited of a particular image class, and this is proved through
automatic image tagging and with a user study. Beyond the deep network
understanding, summaries are also useful for many quantitative reasons: their
number is correlated with ability of a network to classify (more summaries,
better performances), and they can be used to improve the classification
accuracy of a network through summary-driven specializations.Comment: Project page and code available at
http://marcocarletti.altervista.org/publications/understanding-visual-summaries
Explaining image classifiers by removing input features using generative models
Perturbation-based explanation methods often measure the contribution of an
input feature to an image classifier's outputs by heuristically removing it via
e.g. blurring, adding noise, or graying out, which often produce unrealistic,
out-of-samples. Instead, we propose to integrate a generative inpainter into
three representative attribution methods to remove an input feature. Our
proposed change improved all three methods in (1) generating more plausible
counterfactual samples under the true data distribution; (2) being more
accurate according to three metrics: object localization, deletion, and
saliency metrics; and (3) being more robust to hyperparameter changes. Our
findings were consistent across both ImageNet and Places365 datasets and two
different pairs of classifiers and inpainters.Comment: Accepted to Asian Conference on Computer Vision (ACCV), 202