We present a novel approach to automatically detect and classify great ape
calls from continuous raw audio recordings collected during field research. Our
method leverages deep pretrained and sequential neural networks, including
wav2vec 2.0 and LSTM, and is validated on three data sets from three different
great ape lineages (orangutans, chimpanzees, and bonobos). The recordings were
collected by different researchers and include different annotation schemes,
which our pipeline preprocesses and trains in a uniform fashion. Our results
for call detection and classification attain high accuracy. Our method is aimed
to be generalizable to other animal species, and more generally, sound event
detection tasks. To foster future research, we make our pipeline and methods
publicly available.Comment: Accepted at ICPhS 2023 (Poster