42 research outputs found

    Learning spectro-temporal representations of complex sounds with parameterized neural networks

    Get PDF
    Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, that computes specific spectro-temporal modulations based on Gabor kernels (Learnable STRFs) and that is fully interpretable. We evaluated predictive capabilities of this layer on Speech Activity Detection, Speaker Verification, Urban Sound Classification and Zebra Finch Call Type Classification. We found out that models based on Learnable STRFs are on par for all tasks with different toplines, and obtain the best performance for Speech Activity Detection. As this layer is fully interpretable, we used quantitative measures to describe the distribution of the learned spectro-temporal modulations. The filters adapted to each task and focused mostly on low temporal and spectral modulations. The analyses show that the filters learned on human speech have similar spectro-temporal parameters as the ones measured directly in the human auditory cortex. Finally, we observed that the tasks organized in a meaningful way: the human vocalizations tasks closer to each other and bird vocalizations far away from human vocalizations and urban sounds tasks

    Neural Prototype Trees for Interpretable Fine-grained Image Recognition

    Get PDF
    Interpretable machine learning addresses the black-box nature of deep neural networks. Visual prototypes have been suggested for intrinsically interpretable image recognition, instead of generating post-hoc explanations that approximate a trained model. However, a large number of prototypes can be overwhelming. To reduce explanation size and improve interpretability, we propose the Neural Prototype Tree (ProtoTree), a deep learning method that includes prototypes in an interpretable decision tree to faithfully visualize the entire model. In addition to global interpretability, a path in the tree explains a single prediction. Each node in our binary tree contains a trainable prototypical part. The presence or absence of this prototype in an image determines the routing through a node. Decision making is therefore similar to human reasoning: Does the bird have a red throat? And an elongated beak? Then it's a hummingbird! We tune the accuracy-interpretability trade-off using ensembling and pruning. We apply pruning without sacrificing accuracy, resulting in a small tree with only 8 prototypes along a path to classify a bird from 200 species. An ensemble of 5 ProtoTrees achieves competitive accuracy on the CUB-200-2011 and Stanford Cars data sets. Code is available at https://github.com/M-Nauta/ProtoTreeComment: 11 pages, and 9 pages supplementar
    corecore