Feature attribution (FA), or the assignment of class-relevance to different
locations in an image, is important for many classification problems but is
particularly crucial within the neuroscience domain, where accurate mechanistic
models of behaviours, or disease, require knowledge of all features
discriminative of a trait. At the same time, predicting class relevance from
brain images is challenging as phenotypes are typically heterogeneous, and
changes occur against a background of significant natural variation. Here, we
present a novel framework for creating class specific FA maps through
image-to-image translation. We propose the use of a VAE-GAN to explicitly
disentangle class relevance from background features for improved
interpretability properties, which results in meaningful FA maps. We validate
our method on 2D and 3D brain image datasets of dementia (ADNI dataset), ageing
(UK Biobank), and (simulated) lesion detection. We show that FA maps generated
by our method outperform baseline FA methods when validated against ground
truth. More significantly, our approach is the first to use latent space
sampling to support exploration of phenotype variation. Our code will be
available online at https://github.com/CherBass/ICAM.Comment: Submitted to NeurIPS 2020: Neural Information Processing Systems.
Keywords: interpretable, classification, feature attribution, domain
translation, variational autoencoder, generative adversarial network,
neuroimagin