4,665 research outputs found

    AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

    Full text link
    Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy, hence also providing a performance benchmark. The results and study in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis

    ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning

    Get PDF
    Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis – particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository – the Orchive – comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species

    Ecology & computer audition: applications of audio technology to monitor organisms and environment

    Get PDF
    Among the 17 Sustainable Development Goals (SDGs) proposed within the 2030 Agenda and adopted by all the United Nations member states, the 13th SDG is a call for action to combat climate change. Moreover, SDGs 14 and 15 claim the protection and conservation of life below water and life on land, respectively. In this work, we provide a literature-founded overview of application areas, in which computer audition – a powerful but in this context so far hardly considered technology, combining audio signal processing and machine intelligence – is employed to monitor our ecosystem with the potential to identify ecologically critical processes or states. We distinguish between applications related to organisms, such as species richness analysis and plant health monitoring, and applications related to the environment, such as melting ice monitoring or wildfire detection. This work positions computer audition in relation to alternative approaches by discussing methodological strengths and limitations, as well as ethical aspects. We conclude with an urgent call to action to the research community for a greater involvement of audio intelligence methodology in future ecosystem monitoring approaches

    Listening to the World Improves Speech Command Recognition

    Full text link
    We study transfer learning in convolutional network architectures applied to the task of recognizing audio, such as environmental sound events and speech commands. Our key finding is that not only is it possible to transfer representations from an unrelated task like environmental sound classification to a voice-focused task like speech command recognition, but also that doing so improves accuracies significantly. We also investigate the effect of increased model capacity for transfer learning audio, by first validating known results from the field of Computer Vision of achieving better accuracies with increasingly deeper networks on two audio datasets: UrbanSound8k and the newly released Google Speech Commands dataset. Then we propose a simple multiscale input representation using dilated convolutions and show that it is able to aggregate larger contexts and increase classification performance. Further, the models trained using a combination of transfer learning and multiscale input representations need only 40% of the training data to achieve similar accuracies as a freshly trained model with 100% of the training data. Finally, we demonstrate a positive interaction effect for the multiscale input and transfer learning, making a case for the joint application of the two techniques.Comment: 8 page

    A survey on artificial intelligence-based acoustic source identification

    Get PDF
    The concept of Acoustic Source Identification (ASI), which refers to the process of identifying noise sources has attracted increasing attention in recent years. The ASI technology can be used for surveillance, monitoring, and maintenance applications in a wide range of sectors, such as defence, manufacturing, healthcare, and agriculture. Acoustic signature analysis and pattern recognition remain the core technologies for noise source identification. Manual identification of acoustic signatures, however, has become increasingly challenging as dataset sizes grow. As a result, the use of Artificial Intelligence (AI) techniques for identifying noise sources has become increasingly relevant and useful. In this paper, we provide a comprehensive review of AI-based acoustic source identification techniques. We analyze the strengths and weaknesses of AI-based ASI processes and associated methods proposed by researchers in the literature. Additionally, we did a detailed survey of ASI applications in machinery, underwater applications, environment/event source recognition, healthcare, and other fields. We also highlight relevant research directions

    Sounding out ecoacoustic metrics: avian species richness is predicted by acoustic indices in temperate but not tropical habitats

    Get PDF
    Affordable, autonomous recording devices facilitate large scale acoustic monitoring and Rapid Acoustic Survey is emerging as a cost-effective approach to ecological monitoring; the success of the approach rests on the de- velopment of computational methods by which biodiversity metrics can be automatically derived from remotely collected audio data. Dozens of indices have been proposed to date, but systematic validation against classical, in situ diversity measures are lacking. This study conducted the most comprehensive comparative evaluation to date of the relationship between avian species diversity and a suite of acoustic indices. Acoustic surveys were carried out across habitat gradients in temperate and tropical biomes. Baseline avian species richness and subjective multi-taxa biophonic density estimates were established through aural counting by expert ornithol- ogists. 26 acoustic indices were calculated and compared to observed variations in species diversity. Five acoustic diversity indices (Bioacoustic Index, Acoustic Diversity Index, Acoustic Evenness Index, Acoustic Entropy, and the Normalised Difference Sound Index) were assessed as well as three simple acoustic descriptors (Root-mean-square, Spectral centroid and Zero-crossing rate). Highly significant correlations, of up to 65%, between acoustic indices and avian species richness were observed across temperate habitats, supporting the use of automated acoustic indices in biodiversity monitoring where a single vocal taxon dominates. Significant, weaker correlations were observed in neotropical habitats which host multiple non-avian vocalizing species. Multivariate classification analyses demonstrated that each habitat has a very distinct soundscape and that AIs track observed differences in habitat-dependent community composition. Multivariate analyses of the relative predictive power of AIs show that compound indices are more powerful predictors of avian species richness than any single index and simple descriptors are significant contributors to avian diversity prediction in multi-taxa tropical environments. Our results support the use of community level acoustic indices as a proxy for species richness and point to the potential for tracking subtler habitat-dependent changes in community composition. Recommendations for the design of compound indices for multi-taxa community composition appraisal are put forward, with consideration for the requirements of next generation, low power remote monitoring networks

    Sounding the call for a global library of underwater biological sounds

    Get PDF
    © The Author(s), 2022. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Parsons, M., Lin, T.-H., Mooney, T., Erbe, C., Juanes, F., Lammers, M., Li, S., Linke, S., Looby, A., Nedelec, S., Van Opzeeland, I., Radford, C., Rice, A., Sayigh, L., Stanley, J., Urban, E., & Di Iorio, L. Sounding the call for a global library of underwater biological sounds. Frontiers in Ecology and Evolution, 10, (2022): 810156, https://doi.org/10.3389/fevo.2022.810156.Aquatic environments encompass the world’s most extensive habitats, rich with sounds produced by a diversity of animals. Passive acoustic monitoring (PAM) is an increasingly accessible remote sensing technology that uses hydrophones to listen to the underwater world and represents an unprecedented, non-invasive method to monitor underwater environments. This information can assist in the delineation of biologically important areas via detection of sound-producing species or characterization of ecosystem type and condition, inferred from the acoustic properties of the local soundscape. At a time when worldwide biodiversity is in significant decline and underwater soundscapes are being altered as a result of anthropogenic impacts, there is a need to document, quantify, and understand biotic sound sources–potentially before they disappear. A significant step toward these goals is the development of a web-based, open-access platform that provides: (1) a reference library of known and unknown biological sound sources (by integrating and expanding existing libraries around the world); (2) a data repository portal for annotated and unannotated audio recordings of single sources and of soundscapes; (3) a training platform for artificial intelligence algorithms for signal detection and classification; and (4) a citizen science-based application for public users. Although individually, these resources are often met on regional and taxa-specific scales, many are not sustained and, collectively, an enduring global database with an integrated platform has not been realized. We discuss the benefits such a program can provide, previous calls for global data-sharing and reference libraries, and the challenges that need to be overcome to bring together bio- and ecoacousticians, bioinformaticians, propagation experts, web engineers, and signal processing specialists (e.g., artificial intelligence) with the necessary support and funding to build a sustainable and scalable platform that could address the needs of all contributors and stakeholders into the future.Support for the initial author group to meet, discuss, and build consensus on the issues within this manuscript was provided by the Scientific Committee on Oceanic Research, Monmouth University Urban Coast Institute, and Rockefeller Program for the Human Environment. The U.S. National Science Foundation supported the publication of this article through Grant OCE-1840868 to the Scientific Committee on Oceanic Research
    • …
    corecore