16 research outputs found

    Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation

    Get PDF
    Audio data augmentation is a key step in training deep neural networks for solving audio classification tasks. In this paper, we introduce Audiogmenter, a novel audio data augmentation library in MATLAB. We provide 15 different augmentation algorithms for raw audio data and 8 for spectrograms. We efficiently implemented several augmentation techniques whose usefulness has been extensively proved in the literature. To the best of our knowledge, this is the largest MATLAB audio data augmentation library freely available. We validate the efficiency of our algorithms evaluating them on the ESC-50 dataset. The toolbox and its documentation can be downloaded at https://github.com/LorisNanni/Audiogmenter

    Identifying patterns of human and bird activities using bioacoustic data

    Get PDF
    In general, humans and animals often interact within the same environment at the same time. Human activities may disturb or affect some bird activities. Therefore, it is important to monitor and study the relationships between human and animal activities. This paper proposed a system able not only to automatically classify human and bird activities using bioacoustic data, but also to automatically summarize patterns of events over time. To perform automatic summarization of acoustic events, a frequency–duration graph (FDG) framework was proposed to summarize the patterns of human and bird activities. This system first performs data pre-processing work on raw bioacoustic data and then applies a support vector machine (SVM) model and a multi-layer perceptron (MLP) model to classify human and bird chirping activities before using the FDG framework to summarize results. The SVM model achieved 98% accuracy on average and the MLP model achieved 98% accuracy on average across several day-long recordings. Three case studies with real data show that the FDG framework correctly determined the patterns of human and bird activities over time and provided both statistical and graphical insight into the relationships between these two events

    Spectrogram classification using dissimilarity space

    Get PDF
    In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs

    Animal sound classification using dissimilarity spaces

    Get PDF
    The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using four different backbones with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. The proposed approach uses clustering methods to determine a set of centroids (in both a supervised and unsupervised fashion) from the spectrograms in the dataset. Such centroids are exploited to generate the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, experiments process the spectrograms using the heterogeneous auto-similarities of characteristics. Once the similarity spaces are computed, each pattern is \u201cprojected\u201d into the space to obtain a vector space representation; this descriptor is then coupled to a support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best standalone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset

    Biotic sound SNR influence analysis on acoustic indices

    Get PDF
    In recent years, passive acoustic monitoring (PAM) has become increasingly popular. Many acoustic indices (AIs) have been proposed for rapid biodiversity assessment (RBA), however, most acoustic indices have been reported to be susceptible to abiotic sounds such as wind or rain noise when biotic sound is masked, which greatly limits the application of these acoustic indices. In this work, in order to take an insight into the influence mechanism of signal-to-noise ratio (SNR) on acoustic indices, four most commonly used acoustic indices, i.e., the bioacoustic index (BIO), the acoustic diversity index (ADI), the acoustic evenness index (AEI), and the acoustic complexity index (ACI), were investigated using controlled computational experiments with field recordings collected in a suburban park in Xuzhou, China, in which bird vocalizations were employed as typical biotic sounds. In the experiments, different signal-to-noise ratio conditions were obtained by varying biotic sound intensities while keeping the background noise fixed. Experimental results showed that three indices (acoustic diversity index, acoustic complexity index, and bioacoustic index) decreased while the trend of acoustic evenness index was in the opposite direction as signal-to-noise ratio declined, which was owing to several factors summarized as follows. Firstly, as for acoustic diversity index and acoustic evenness index, the peak value in the spectrogram will no longer correspond to the biotic sounds of interest when signal-to-noise ratio decreases to a certain extent, leading to erroneous results of the proportion of sound occurring in each frequency band. Secondly, in bioacoustic index calculation, the accumulation of the difference between the sound level within each frequency band and the minimum sound level will drop dramatically with reduced biotic sound intensities. Finally, the acoustic complexity index calculation result relies on the ratio between total differences among all adjacent frames and the total sum of all frames within each temporal step and frequency bin in the spectrogram. With signal-to-noise ratio decreasing, the biotic components contribution in both the total differences and the total sum presents a complex impact on the final acoustic complexity index value. This work is helpful to more comprehensively interpret the values of the above acoustic indices in a real-world environment and promote the applications of passive acoustic monitoring in rapid biodiversity assessment

    Opportunities and challenges for big data ornithology

    Get PDF
    Recent advancements in information technology and data acquisition have created both new research opportunities and new challenges for using big data in ornithology. We provide an overview of the past, present, and future of big data in ornithology, and explore the rewards and risks associated with their application. Structured data resources (e.g., North American Breeding Bird Survey) continue to play an important role in advancing our understanding of bird population ecology, and the recent advent of semistructured (e.g., eBird) and unstructured (e.g., weather surveillance radar) big data resources has promoted the development of new empirical perspectives that are generating novel insights. For example, big data have been used to study and model bird diversity and distributions across space and time, explore the patterns and determinants of broad-scale migration strategies, and examine the dynamics and mechanisms associated with geographic and phenological responses to global change. The application of big data also holds a number of challenges wherein high data volume and dimensionality can result in noise accumulation, spurious correlations, and incidental endogeneity. In total, big data resources continue to add empirical breadth and detail to ornithology, often at very broad spatial extents, but how the challenges underlying this approach can best be mitigated to maximize inferential quality and rigor needs to be carefully considered. Los avances recientes en la tecnolog´ıa de la informaci ´on y la adquisici ´on de datos han creado tanto nuevas oportunidades de investigaci ´on como desaf´ıos para el uso de datos masivos (big data) en ornitolog´ıa. Brindamos una visi ´on general del pasado, presente y futuro de los datos masivos en ornitolog´ıa y exploramos las recompensas y desaf´ıos asociados a su aplicaci ´ on. Los recursos de datos estructurados (e.g., Muestreo de Aves Reproductivas de Am´erica del Norte) siguen jugando un rol importante en el avance de nuestro entendimiento de la ecolog´ıa de poblaciones de las aves, y el advenimiento reciente de datos masivos semi-estructurados (e.g., eBird) y desestructurados (e.g., radar de vigilancia clima´tica) han promovido el desarrollo de nuevas perspectivas emp´ıricas que esta´n generando miradas novedosas. Por ejemplo, los datos masivos han sido usados para estudiar y modelar la diversidad y distribuci ´on de las aves a trav´es del tiempo y del espacio, explorar los patrones y los determinantes de las estrategias de migraci ´on a gran escala, y examinar las dina´micas y los mecanismos asociados con las respuestas geogra´ficas y fenol ´ ogicas al cambio global. La aplicaci ´on de datos masivos tambi´en contiene una serie de desaf´ıos donde el gran volumen de datos y la dimensionalidad pueden generar una acumulaci ´on de ruido, correlaciones espurias y endogeneidad incidental. En total, los recursos de datos masivos contin ´uan agregando amplitud y detalle emp´ırico a la ornitolog´ıa, usualmente a escalas espaciales muy amplias, pero necesita considerarse cuidadosamente c ´omo los desaf´ıos que subyacen este enfoque pueden ser mitigados del mejor modo para maximizar su calidad inferencial y rigor

    Automated Detection of COVID-19 Cough Sound using Mel-Spectrogram Images and Convolutional Neural Network

    Get PDF
    COVID-19 is a new disease caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) variant. The initial symptoms of the disease commonly include fever (83-98%), fatigue or myalgia, dry cough (76-82%), and shortness of breath (31-55%). Given the prevalence of coughing as a symptom, artificial intelligence has been employed to detect COVID-19 based on cough sounds. This study aims to compare the performance of six different Convolutional Neural Network (CNN) models (VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152) in detecting COVID-19 using mel-spectrogram images derived from cough sounds. The training and validation of these CNN models were conducted using the Virufy dataset, consisting of 121 cough audio recordings with a sample rate of 48,000 and a duration of 1 second for all audio data. Audio data was processed to generate mel-spectrogram images, which were subsequently employed as inputs for the CNN models. This study used accuracy, area under curve (AUC), precision, recall, and F1 score as evaluation metrics. The AlexNet model, utilizing an input size of 227×227, exhibited the best performance with the highest Area Under the Curve (AUC) value of 0.930. This study provides compelling evidence of the efficacy of CNN models in detecting COVID-19 based on cough sounds through mel-spectrogram images. Furthermore, the study underscores the impact of input size on model performance. This research contributes to identifying the CNN model that demonstrates the best performance in COVID-19 detection based on cough sounds. By exploring the effectiveness of CNN models with different mel-spectrogram image sizes, this study offers novel insights into the optimal and fast audio-based method for early detection of COVID-19. Additionally, this study establishes the fundamental groundwork for selecting an appropriate CNN methodology for early detection of COVID-19

    Passive acoustic monitoring provides a fresh perspective on fundamental ecological questions

    Get PDF
    Passive acoustic monitoring (PAM) has emerged as a transformative tool for applied ecology, conservation and biodiversity monitoring, but its potential contribution to fundamental ecology is less often discussed, and fundamental PAM studies tend to be descriptive, rather than mechanistic. Here, we chart the most promising directions for ecologists wishing to use the suite of currently available acoustic methods to address long-standing fundamental questions in ecology and explore new avenues of research. In both terrestrial and aquatic habitats, PAM provides an opportunity to ask questions across multiple spatial scales and at fine temporal resolution, and to capture phenomena or species that are difficult to observe. In combination with traditional approaches to data collection, PAM could release ecologists from myriad limitations that have, at times, precluded mechanistic understanding. We discuss several case studies to demonstrate the potential contribution of PAM to biodiversity estimation, population trend analysis, assessing climate change impacts on phenology and distribution, and understanding disturbance and recovery dynamics. We also highlight what is on the horizon for PAM, in terms of near-future technological and methodological developments that have the potential to provide advances in coming years. Overall, we illustrate how ecologists can harness the power of PAM to address fundamental ecological questions in an era of ecology no longer characterised by data limitation
    corecore