End to end raw audio deep learning of transients, application to bioacoustics

Ferrari, Maxence; Glotin, Hervé; Marxer, Ricard

End to end raw audio deep learning of transients, application to bioacoustics

Authors: Maxence Ferrari
Hervé Glotin
Ricard Marxer
Publication date: 7 December 2020
Publisher: HAL CCSD
Doi

Abstract

International audienceIn this paper, we propose a raw audio deep learning of clicks, building specific convolution filters in high dimension to elaborate complex TF representation. The CNN has 12 layers for several thousands of audio bins in inputs, and a dozen of output classes. We test this model on the international DCLDE challenge of 3 To of clicks (http://sabiod.org/DCLDE). This challenge was open in 2018, but no team answered before. At our knowledge, our model is the first raw audio click classifier with nearly 70% accurray on a dozen of classes. We discuss on the class confusions of the model and possible enhancement using data augmentation and regulation

Similar works

Full text

Available Versions

HAL AMU

oai:HAL:hal-03230842v1

Last time updated on 19/06/2021