42 research outputs found
Learning spectro-temporal representations of complex sounds with parameterized neural networks
Deep Learning models have become potential candidates for auditory
neuroscience research, thanks to their recent successes on a variety of
auditory tasks. Yet, these models often lack interpretability to fully
understand the exact computations that have been performed. Here, we proposed a
parametrized neural network layer, that computes specific spectro-temporal
modulations based on Gabor kernels (Learnable STRFs) and that is fully
interpretable. We evaluated predictive capabilities of this layer on Speech
Activity Detection, Speaker Verification, Urban Sound Classification and Zebra
Finch Call Type Classification. We found out that models based on Learnable
STRFs are on par for all tasks with different toplines, and obtain the best
performance for Speech Activity Detection. As this layer is fully
interpretable, we used quantitative measures to describe the distribution of
the learned spectro-temporal modulations. The filters adapted to each task and
focused mostly on low temporal and spectral modulations. The analyses show that
the filters learned on human speech have similar spectro-temporal parameters as
the ones measured directly in the human auditory cortex. Finally, we observed
that the tasks organized in a meaningful way: the human vocalizations tasks
closer to each other and bird vocalizations far away from human vocalizations
and urban sounds tasks
Neural Prototype Trees for Interpretable Fine-grained Image Recognition
Interpretable machine learning addresses the black-box nature of deep neural
networks. Visual prototypes have been suggested for intrinsically interpretable
image recognition, instead of generating post-hoc explanations that approximate
a trained model. However, a large number of prototypes can be overwhelming. To
reduce explanation size and improve interpretability, we propose the Neural
Prototype Tree (ProtoTree), a deep learning method that includes prototypes in
an interpretable decision tree to faithfully visualize the entire model. In
addition to global interpretability, a path in the tree explains a single
prediction. Each node in our binary tree contains a trainable prototypical
part. The presence or absence of this prototype in an image determines the
routing through a node. Decision making is therefore similar to human
reasoning: Does the bird have a red throat? And an elongated beak? Then it's a
hummingbird! We tune the accuracy-interpretability trade-off using ensembling
and pruning. We apply pruning without sacrificing accuracy, resulting in a
small tree with only 8 prototypes along a path to classify a bird from 200
species. An ensemble of 5 ProtoTrees achieves competitive accuracy on the
CUB-200-2011 and Stanford Cars data sets. Code is available at
https://github.com/M-Nauta/ProtoTreeComment: 11 pages, and 9 pages supplementar