End to end raw audio deep learning of transients, application to bioacoustics

Abstract

International audienceIn this paper, we propose a raw audio deep learning of clicks, building specific convolution filters in high dimension to elaborate complex TF representation. The CNN has 12 layers for several thousands of audio bins in inputs, and a dozen of output classes. We test this model on the international DCLDE challenge of 3 To of clicks (http://sabiod.org/DCLDE). This challenge was open in 2018, but no team answered before. At our knowledge, our model is the first raw audio click classifier with nearly 70% accurray on a dozen of classes. We discuss on the class confusions of the model and possible enhancement using data augmentation and regulation

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 19/06/2021