56 research outputs found

    Correlation Clustering of Bird Sounds

    Full text link
    Bird sound classification is the task of relating any sound recording to those species of bird that can be heard in the recording. Here, we study bird sound clustering, the task of deciding for any pair of sound recordings whether the same species of bird can be heard in both. We address this problem by first learning, from a training set, probabilities of pairs of recordings being related in this way, and then inferring a maximally probable partition of a test set by correlation clustering. We address the following questions: How accurate is this clustering, compared to a classification of the test set? How do the clusters thus inferred relate to the clusters obtained by classification? How accurate is this clustering when applied to recordings of bird species not heard during training? How effective is this clustering in separating, from bird sounds, environmental noise not heard during training?Comment: 13 page

    Automatic Bat Call Classification using Transformer Networks

    Full text link
    Automatically identifying bat species from their echolocation calls is a difficult but important task for monitoring bats and the ecosystem they live in. Major challenges in automatic bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models suffer from relatively poor performance on real-life data due to being trained on single call datasets and, moreover, are often too slow for real-time classification. Here, we propose a Transformer architecture for multi-label classification with potential applications in real-time classification scenarios. We train our model on synthetically generated multi-species recordings by merging multiple bats calls into a single recording with multiple simultaneous calls. Our approach achieves a single species accuracy of 88.92% (F1-score of 84.23%) and a multi species macro F1-score of 74.40% on our test set. In comparison to three other tools on the independent and publicly available dataset ChiroVox, our model achieves at least 25.82% better accuracy for single species classification and at least 6.9% better macro F1-score for multi species classification.Comment: Volume 78, December 2023, 10228

    LifeCLEF Bird Identification Task 2016: The arrival of Deep learning

    Get PDF
    International audienceThe LifeCLEF bird identification challenge provides a large-scale testbed for the system-oriented evaluation of bird species identification based on audio recordings. One of its main strength is that the data used for the evaluation is collected through Xeno-Canto, the largest network of bird sound recordists in the world. This makes the task closer to the conditions of a real-world application than previous, similar initiatives. The main novelty of the 2016-th edition of the challenge was the inclusion of soundscape recordings in addition to the usual xeno-canto recordings that focus on a single foreground species. This paper reports the methodology of the conducted evaluation, the overview of the systems experimented by the 6 participating research groups and a synthetic analysis of the obtained results

    Automated call detection for acoustic surveys with structured calls of varying length

    Get PDF
    Funding: Y.W. is partly funded by the China Scholarship Council (CSC) for Ph.D. study at the University of St Andrews, UK.1. When recorders are used to survey acoustically conspicuous species, identification calls of the target species in recordings is essential for estimating density and abundance. We investigate how well deep neural networks identify vocalisations consisting of phrases of varying lengths, each containing a variable number of syllables. We use recordings of Hainan gibbon (Nomascus hainanus) vocalisations to develop and test the methods. 2. We propose two methods for exploiting the two-level structure of such data. The first combines convolutional neural network (CNN) models with a hidden Markov model (HMM) and the second uses a convolutional recurrent neural network (CRNN). Both models learn acoustic features of syllables via a CNN and temporal correlations of syllables into phrases either via an HMM or recurrent network. We compare their performance to commonly used CNNs LeNet and VGGNet, and support vector machine (SVM). We also propose a dynamic programming method to evaluate how well phrases are predicted. This is useful for evaluating performance when vocalisations are labelled by phrases, not syllables. 3. Our methods perform substantially better than the commonly used methods when applied to the gibbon acoustic recordings. The CRNN has an F-score of 90% on phrase prediction, which is 18% higher than the best of the SVM or LeNet and VGGNet methods. HMM post-processing raised the F-score of these last three methods to as much as 87%. The number of phrases is overestimated by CNNs and SVM, leading to error rates between 49% and 54%. With HMM, these error rates can be reduced to 0.4% at the lowest. Similarly, the error rate of CRNN's prediction is no more than 0.5%. 4. CRNNs are better at identifying phrases of varying lengths composed of a varying number of syllables than simpler CNN or SVM models. We find a CRNN model to be best at this task, with a CNN combined with an HMM performing almost as well. We recommend that these kinds of models are used for species whose vocalisations are structured into phrases of varying lengths.Publisher PDFPeer reviewe

    Persistent near real-time passive acoustic monitoring for baleen whales from a moored buoy: System description and evaluation

    Get PDF
    © The Author(s), 2019. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Baumgartner, M. F., Bonnell, J., Van Parijs, S. M., Corkeron, P. J., Hotchkin, C., Ball, K., Pelletier, L., Partan, J., Peters, D., Kemp, J., Pietro, J., Newhall, K., Stokes, A., Cole, T. V. N., Quintana, E., & Kraus, S. D. Persistent near real-time passive acoustic monitoring for baleen whales from a moored buoy: System description and evaluation. Methods in Ecology and Evolution, 10(9), (2019): 1476-1489, doi: 10.1111/2041-210X.13244.1. Managing interactions between human activities and marine mammals often relies on an understanding of the real‐time distribution or occurrence of animals. Visual surveys typically cannot provide persistent monitoring because of expense and weather limitations, and while passive acoustic recorders can monitor continuously, the data they collect are often not accessible until the recorder is recovered. 2. We have developed a moored passive acoustic monitoring system that provides near real‐time occurrence estimates for humpback, sei, fin and North Atlantic right whales from a single site for a year, and makes those occurrence estimates available via a publicly accessible website, email and text messages, a smartphone/tablet app and the U.S. Coast Guard's maritime domain awareness software. We evaluated this system using a buoy deployed off the coast of Massachusetts during 2015–2016 and redeployed again during 2016–2017. Near real‐time estimates of whale occurrence were compared to simultaneously collected archived audio as well as whale sightings collected near the buoy by aerial surveys. 3. False detection rates for right, humpback and sei whales were 0% and nearly 0% for fin whales, whereas missed detection rates at daily time scales were modest (12%–42%). Missed detections were significantly associated with low calling rates for all species. We observed strong associations between right whale visual sightings and near real‐time acoustic detections over a monitoring range 30–40 km and temporal scales of 24–48 hr, suggesting that silent animals were not especially problematic for estimating occurrence of right whales in the study area. There was no association between acoustic detections and visual sightings of humpback whales. 4. The moored buoy has been used to reduce the risk of ship strikes for right whales in a U.S. Coast Guard gunnery range, and can be applied to other mitigation applications.We thank Annamaria Izzi, Danielle Cholewiak and Genevieve Davis of the NOAA NEFSC for assistance in developing the analyst protocol. We are grateful to the NOAA NEFSC aerial survey observers (Leah Crowe, Pete Duley, Jen Gatzke, Allison Henry, Christin Khan and Karen Vale) and the NEAq aerial survey observers (Angela Bostwick, Marianna Hagbloom and Paul Nagelkirk). Danielle Cholewiak and three anonymous reviewers provided constructive criticism on earlier drafts of the manuscript. Funding for this project was provided by the NOAA NEFSC, NOAA Advanced Sampling Technology Work Group, Environmental Security Technology Certification Program of the U.S. Department of Defense, the U.S. Navy's Living Marine Resources Program, Massachusetts Clean Energy Center and the Bureau of Ocean Energy Management. Funding from NOAA was facilitated by the Cooperative Institute for the North Atlantic Region (CINAR) under Cooperative Agreement NA14OAR4320158

    Machine Learning-based Classification of Birds through Birdsong

    Full text link
    Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify (Australian) birds from publicly available audio files of their birdsong. We present approaches used for data processing and augmentation and compare the results of various state of the art machine learning models. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58

    Prototype Biodiversity Digital Twin: Real-time bird monitoring with citizen-science data

    Get PDF
    Bird populations respond rapidly to environmental change making them excellent ecological indicators. Climate shifts advance migration, causing mismatches in breeding and resources. Understanding these changes is crucial to monitor the state of the environment. Citizen science offers vast potential to collect biodiversity data. We outline a project that combines citizen science with AI-based bird sound classification. The mobile app records bird vocalisations that are classified by AI and stored for re-analysis. Additionally, it shows a shared observation board that visualises collective classifications. By merging long-term monitoring and modern citizen science, this project harnesses the strength of both approaches for comprehensive bird population monitoring

    A Vocal-Based Analytical Method for Goose Behaviour Recognition

    Get PDF
    Since human-wildlife conflicts are increasing, the development of cost-effective methods for reducing damage or conflict levels is important in wildlife management. A wide range of devices to detect and deter animals causing conflict are used for this purpose, although their effectiveness is often highly variable, due to habituation to disruptive or disturbing stimuli. Automated recognition of behaviours could form a critical component of a system capable of altering the disruptive stimuli to avoid this. In this paper we present a novel method to automatically recognise goose behaviour based on vocalisations from flocks of free-living barnacle geese (Branta leucopsis). The geese were observed and recorded in a natural environment, using a shielded shotgun microphone. The classification used Support Vector Machines (SVMs), which had been trained with labeled data. Greenwood Function Cepstral Coefficients (GFCC) were used as features for the pattern recognition algorithm, as they can be adjusted to the hearing capabilities of different species. Three behaviours are classified based in this approach, and the method achieves a good recognition of foraging behaviour (86–97% sensitivity, 89–98% precision) and a reasonable recognition of flushing (79–86%, 66–80%) and landing behaviour(73–91%, 79–92%). The Support Vector Machine has proven to be a robust classifier for this kind of classification, as generality and non-linear capabilities are important. We conclude that vocalisations can be used to automatically detect behaviour of conflict wildlife species, and as such, may be used as an integrated part of a wildlife management system

    Animal sound classification using dissimilarity spaces

    Get PDF
    The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using four different backbones with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. The proposed approach uses clustering methods to determine a set of centroids (in both a supervised and unsupervised fashion) from the spectrograms in the dataset. Such centroids are exploited to generate the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, experiments process the spectrograms using the heterogeneous auto-similarities of characteristics. Once the similarity spaces are computed, each pattern is \u201cprojected\u201d into the space to obtain a vector space representation; this descriptor is then coupled to a support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best standalone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset
    corecore