We propose a shift towards end-to-end learning in bird sound monitoring by
combining self-supervised (SSL) and deep active learning (DAL). Leveraging
transformer models, we aim to bypass traditional spectrogram conversions,
enabling direct raw audio processing. ActiveBird2Vec is set to generate
high-quality bird sound representations through SSL, potentially accelerating
the assessment of environmental changes and decision-making processes for wind
farms. Additionally, we seek to utilize the wide variety of bird vocalizations
through DAL, reducing the reliance on extensively labeled datasets by human
experts. We plan to curate a comprehensive set of tasks through Huggingface
Datasets, enhancing future comparability and reproducibility of bioacoustic
research. A comparative analysis between various transformer models will be
conducted to evaluate their proficiency in bird sound recognition tasks. We aim
to accelerate the progression of avian bioacoustic research and contribute to
more effective conservation strategies.Comment: Accepted @AI4S ECAI2023. This is the author's version of the wor