Search CORE

1,967 research outputs found

Unsupervised classification to improve the quality of a bird song recording dataset

Author: Cesne Maxime Le
Haupert Sylvain
Michaud Félix
Sueur Jérôme
Publication venue: 'Elsevier BV'
Publication date: 15/02/2023
Field of study

Open audio databases such as Xeno-Canto are widely used to build datasets to explore bird song repertoire or to train models for automatic bird sound classification by deep learning algorithms. However, such databases suffer from the fact that bird sounds are weakly labelled: a species name is attributed to each audio recording without timestamps that provide the temporal localization of the bird song of interest. Manual annotations can solve this issue, but they are time consuming, expert-dependent, and cannot run on large datasets. Another solution consists in using a labelling function that automatically segments audio recordings before assigning a label to each segmented audio sample. Although labelling functions were introduced to expedite strong label assignment, their classification performance remains mostly unknown. To address this issue and reduce label noise (wrong label assignment) in large bird song datasets, we introduce a data-centric novel labelling function composed of three successive steps: 1) time-frequency sound unit segmentation, 2) feature computation for each sound unit, and 3) classification of each sound unit as bird song or noise with either an unsupervised DBSCAN algorithm or the supervised BirdNET neural network. The labelling function was optimized, validated, and tested on the songs of 44 West-Palearctic common bird species. We first showed that the segmentation of bird songs alone aggregated from 10% to 83% of label noise depending on the species. We also demonstrated that our labelling function was able to significantly reduce the initial label noise present in the dataset by up to a factor of three. Finally, we discuss different opportunities to design suitable labelling functions to build high-quality animal vocalizations with minimum expert annotation effort

arXiv.org e-Print Archive

Recommended from our members

Parallels in the sequential organization of birdsong and human speech.

Author: Gentner Timothy Q
Sainburg Tim
Theilman Brad
Thielk Marvin
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes

eScholarship - University of California

Audio Event Detection using Weakly Labeled Data

Author: Gencoglu O.
J. F.
Kons Z.
Kumar A.
Mandel M. I.
Pancoast S.
Pikrakis A.
Rumelhart D. E.
Stowell D.
Wang F.
Wang J.
Werbos P. J.
Zhou Z.-H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2016
Field of study

Acoustic event detection is essential for content analysis and description of multimedia recordings. The majority of current literature on the topic learns the detectors through fully-supervised techniques employing strongly labeled data. However, the labels available for majority of multimedia data are generally weak and do not provide sufficient detail for such methods to be employed. In this paper we propose a framework for learning acoustic event detectors using only weakly labeled data. We first show that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem. We then suggest two frameworks for solving multiple-instance learning, one based on support vector machines, and the other on neural networks. The proposed methods can help in removing the time consuming and expensive process of manually annotating data to facilitate fully supervised learning. Moreover, it can not only detect events in a recording but can also provide temporal locations of events in the recording. This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge

Author: Huang Zi
Li Lincheng
Li Peike
Liu Chen
Wang Dadong
Yu Xin
Zhang Hu
Publication venue
Publication date: 20/08/2023
Field of study

Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding sources by predicting pixel-wise maps. Previous methods assume that each sound component in an audio signal always has a visual counterpart in the image. However, this assumption overlooks that off-screen sounds and background noise often contaminate the audio recordings in real-world scenarios. They impose significant challenges on building a consistent semantic mapping between audio and visual signals for AVS models and thus impede precise sound localization. In this work, we propose a two-stage bootstrapping audio-visual segmentation framework by incorporating multi-modal foundation knowledge. In a nutshell, our BAVS is designed to eliminate the interference of background noise or off-screen sounds in segmentation by establishing the audio-visual correspondences in an explicit manner. In the first stage, we employ a segmentation model to localize potential sounding objects from visual data without being affected by contaminated audio signals. Meanwhile, we also utilize a foundation audio classification model to discern audio semantics. Considering the audio tags provided by the audio foundation model are noisy, associating object masks with audio tags is not trivial. Thus, in the second stage, we develop an audio-visual semantic integration strategy (AVIS) to localize the authentic-sounding objects. Here, we construct an audio-visual tree based on the hierarchical correspondence between sounds and object categories. We then examine the label concurrency between the localized objects and classified audio tags by tracing the audio-visual tree. With AVIS, we can effectively segment real-sounding objects. Extensive experiments demonstrate the superiority of our method on AVS datasets, particularly in scenarios involving background noise. Our project website is https://yenanliu.github.io/AVSS.github.io/

arXiv.org e-Print Archive

Recommended from our members

Probabilistic models for classification of bioacoustic data

Author: Lakshminarayanan Balaji
Publication venue: 'Oregon State University'
Publication date
Field of study

Probabilistic models have been successfully applied for a wide variety of problems, such as but not limited to information retrieval, computer vision, bio-informatics and speech processing. Probabilistic models allow us to encode our assumptions about the data in an elegant fashion and enable us to perform machine learning tasks such as classification and clustering in a principled manner. Probabilistic models for bio-acoustic data help in identifying interesting patterns in the data (for instance, the species-specific vocabulary), as well as species identification (classification) in recordings where the label is not available. The focus of this thesis is to develop efficient inference techniques for existing models, as well as develop probabilistic models tailored to bioacoustic data. First, we develop inference algorithms for the supervised latent Dirichlet allocation (LDA) model. We present collapsed variational Bayes, collapsed Gibbs sampling and maximum-a-posteriori (MAP) inference for parameter estimation and classification in supervised LDA. We provide an empirical evaluation of the trade-off between computational complexity and classification performance of the inference methods for supervised LDA, on audio classification (species identification in this context)as well as image classification and document classification tasks. Next, we present novel probabilistic models for bird sound recordings, that can capture temporal structure at different hierarchical levels, and model additional information such as the duration and frequency of vocalizations. We present a non-parametric density estimation technique for parameter estimation and show that the MAP classifier for our models can be interpreted as a weighted nearest neighbor classifier. We provide an experimental comparison between the proposed models and a support vector machine based approach, using bird sound recordings from the Cornell Macaulay library.Keywords: classification, species identification, probabilistic models, bioacoustic

ScholarsArchive@OSU

Automatic detection and classi cation of bird sounds in low-resource wildlife audio datasets

Author: Morfi G
Publication venue: 'Queen Mary University of London'
Publication date: 04/06/2019
Field of study

PhDThere are many potential applications of automatic species detection and classifi cation of birds from their sounds (e.g. ecological research, biodiversity monitoring, archival). However, acquiring adequately labelled large-scale and longitudinal data remains a major challenge, especially for species-rich remote areas as well as taxa that require expert input for identi fication. So far, monitoring of avian populations has been performed via manual surveying, sometimes even including the help of volunteers due to the challenging scales of the data. In recent decades, there is an increasing amount of ecological audio datasets that have tags assigned to them to indicate the presence or not of a specific c bird species. However, automated species vocalization detection and identifi cation is a challenging task. There is a high diversity of animal vocalisations, both in the types of the basic syllables and in the way they are combined. Also, there is noise present in most habitats, and many bird communities contain multiple bird species that can potentially have overlapping vocalisations. In recent years, machine learning has experienced a strong growth, due to increased dataset sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings. However, in training a deep learning system to perform automatic detection and audio tagging of wildlife bird sound scenes, two problems often arise. Firstly, even with the increased amount of audio datasets, most publicly available datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, in practice it is difficult to collect enough samples for most classes of interest. These problems are particularly pressing for wildlife audio but also occur in many other scenarios. In this thesis, we investigate and propose methods to perform audio event detection and classi fication on wildlife bird sound scenes and other low-resource audio datasets, such as methods based on image processing and deep learning. We extend deep learning methods for weakly labelled data in a multi-instance learning and multi task learning setting. We evaluate these methods for simultaneously detecting and classifying large numbers of sound types in audio recorded in the wild and other low resource audio datasets

Queen Mary Research Online

Automatic recognition of bird species by their sounds

Author: Fagerlund Seppo
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2007
Field of study

Lintujen äänet jaetaan niiden tehtävän perusteella lauluihin ja kutsuääniin, jotka edelleen jaetaan hierarkisen tason perusteella virkkeisiin, tavuihin ja elementteihin. Näistä tavu on sopiva yksikkö lajitunnistukseen. Erityyppisten äänten kirjo linnuilla on laaja. Tässä työssä keskitytään ääniin, jotka määritellään epäharmonisiksi. Tässä työssä käytettävä lintulajien automaattinen tunnistusjärjestelmä sisältää seuraavat vaiheet: tavujen segmentointi, piirteiden irrotus sekä luokittelijan opetus ja arviointi. Kaikki lajitunnistuskokeilut perustuvat tavujen parametriseen esitykseen käyttäen 19:ta matalan tason äänisignaalin parametria. Tunnistuskokeet toteutettiin kuudella lajilla, jotka tuottavat usein epäharmonisia ääniä. Tulosten perusteella piirteet, jotka liittyvät äänten taajuuskaistaan ja -sisältöön luokittelevat hyvin nämä äänet.Bird sounds are divided by their function into songs and calls which are further divided into hierarchical levels of phrases, syllables and elements. It is shown that syllable is suitable unit for recognition of bird species. Diversity within different types of syllables birds are able to produce is large. In this thesis main focus is sounds that are defined inharmonic. Automatic recognition system for bird species used in this thesis consist of segmentation of syllables, feature generation, classifier design and classifier evaluation phases. Recognition experinments are based on parametric representation of syllables using a total of 19 low level acoustical signal parameters. Simulation experinments were executed with six species that regularly produce inharmonic sounds. Results shows that features related to the frequency band and content of the sound provide good discrimination ability within these sounds

Aaltodoc Publication Archive