645 research outputs found

    ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning

    Get PDF
    Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis – particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository – the Orchive – comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species

    A toolbox for animal call recognition

    Get PDF
    Monitoring the natural environment is increasingly important as habit degradation and climate change reduce theworld’s biodiversity.We have developed software tools and applications to assist ecologists with the collection and analysis of acoustic data at large spatial and temporal scales.One of our key objectives is automated animal call recognition, and our approach has three novel attributes. First, we work with raw environmental audio, contaminated by noise and artefacts and containing calls that vary greatly in volume depending on the animal’s proximity to the microphone. Second, initial experimentation suggested that no single recognizer could dealwith the enormous variety of calls. Therefore, we developed a toolbox of generic recognizers to extract invariant features for each call type. Third, many species are cryptic and offer little data with which to train a recognizer. Many popular machine learning methods require large volumes of training and validation data and considerable time and expertise to prepare. Consequently we adopt bootstrap techniques that can be initiated with little data and refined subsequently. In this paper, we describe our recognition tools and present results for real ecological problems

    Automated call detection for acoustic surveys with structured calls of varying length

    Get PDF
    Funding: Y.W. is partly funded by the China Scholarship Council (CSC) for Ph.D. study at the University of St Andrews, UK.1. When recorders are used to survey acoustically conspicuous species, identification calls of the target species in recordings is essential for estimating density and abundance. We investigate how well deep neural networks identify vocalisations consisting of phrases of varying lengths, each containing a variable number of syllables. We use recordings of Hainan gibbon (Nomascus hainanus) vocalisations to develop and test the methods. 2. We propose two methods for exploiting the two-level structure of such data. The first combines convolutional neural network (CNN) models with a hidden Markov model (HMM) and the second uses a convolutional recurrent neural network (CRNN). Both models learn acoustic features of syllables via a CNN and temporal correlations of syllables into phrases either via an HMM or recurrent network. We compare their performance to commonly used CNNs LeNet and VGGNet, and support vector machine (SVM). We also propose a dynamic programming method to evaluate how well phrases are predicted. This is useful for evaluating performance when vocalisations are labelled by phrases, not syllables. 3. Our methods perform substantially better than the commonly used methods when applied to the gibbon acoustic recordings. The CRNN has an F-score of 90% on phrase prediction, which is 18% higher than the best of the SVM or LeNet and VGGNet methods. HMM post-processing raised the F-score of these last three methods to as much as 87%. The number of phrases is overestimated by CNNs and SVM, leading to error rates between 49% and 54%. With HMM, these error rates can be reduced to 0.4% at the lowest. Similarly, the error rate of CRNN's prediction is no more than 0.5%. 4. CRNNs are better at identifying phrases of varying lengths composed of a varying number of syllables than simpler CNN or SVM models. We find a CRNN model to be best at this task, with a CNN combined with an HMM performing almost as well. We recommend that these kinds of models are used for species whose vocalisations are structured into phrases of varying lengths.Publisher PDFPeer reviewe

    Data autonomy in the age of AI: designing autonomy-supportive data tools for children & families

    Get PDF
    The age of AI is a rapidly evolving and complex space for children. As children increasingly interact with AI-based apps, services and platforms, their data is being increasingly tracked, harvested, aggregated, analysed and exploited in multiple ways that include behavioural engineering and monetisation. Central to such datafication is online service providers' ability to analyse user data to infer personal attributes, subtly manipulating interests and beliefs through micro-targeting and opinion shaping. This can alter the way children perceive and interact with the world, undermining their autonomy. Yet, this datafication often unfolds behind the scenes in apps and services, remaining less noticed and discussed compared to the more straightforward data privacy issues like direct data collection or disclosure. On the other hand, children are often seen as less capable of navigating the intricacies of online life, with parents and guardians presumed to possess greater expertise to steer their children through the digital world. However, the rapid evolution of AI technology and online trends has outpaced parents' ability to keep up. As they adapt to platforms like Snapchat or YouTube, children may already move to the next trend, a shift accelerated by rapid datafication that heightens the challenge of effectively guiding children online. Consequently, there's a mounting call for a child-centred approach, which shifts from just protecting or limiting children with parents in charge, to actively guiding and empowering children to take a leading role. In this shift towards a child-centred approach, there's growing consensus on fostering children's autonomy in the digital space, encompassing the development of their understanding, values, self-determination, and self-identity. Given that data is the cornerstone of AI-based platforms' vast influence, this thesis uniquely focuses on the key concept of data autonomy for children. This exploration follows a structured four-step methodology: 1) Landscape analysis to comprehend the present scope of AI-based platforms for children and the prevalent challenges they encounter; 2) Conceptual review to elucidate the meaning of autonomy for children in the digital realm; 3) Empirical investigation focusing on children's perceptions, needs, and obstacles concerning data autonomy; and 4) Technical evaluation to assess the impact of technical interventions on children's sense of data autonomy. Synthesising the research presented in this thesis, we propose the pivotal concept of data autonomy for children in the age of AI, aiming to address their online wellbeing from a unique data perspective. This work not only lays the foundation for future research on data autonomy as a novel research agenda, but also prompts a rethinking of existing data governance structures towards a more ethical data landscape

    A Systematic Review of Artificial Intelligence in Assistive Technology for People with Visual Impairment

    Get PDF
    Recent advances in artificial intelligence (AI) have led to the development of numerous successful applications that utilize data to significantly enhance the quality of life for people with visual impairment. AI technology has the potential to further improve the lives of visually impaired individuals. However, accurately measuring the development of visual aids continues to be challenging. As an AI model is trained on larger and more diverse datasets, its performance becomes increasingly robust and applicable to a variety of scenarios. In the field of visual impairment, deep learning techniques have emerged as a solution to previous challenges associated with AI models. In this article, we provide a comprehensive and up-to-date review of recent research on the development of AI-powered visual aides tailored to the requirements of individuals with visual impairment. We adopt the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology, meticulously gathering and appraising pertinent literature culled from diverse databases. A rigorous selection process was undertaken, appraising articles against precise inclusion and exclusion criteria. Our meticulous search yielded a trove of 322 articles, and after diligent scrutiny, 12 studies were deemed suitable for inclusion in the ultimate analysis. The study's primary objective is to investigate the application of AI techniques to the creation of intelligent devices that aid visually impaired individuals in their daily lives. We identified a number of potential obstacles that researchers and developers in the field of visual impairment applications might encounter. In addition, opportunities for future research and advancements in AI-driven visual aides are discussed. This review seeks to provide valuable insights into the advancements, possibilities, and challenges in the development and implementation of AI technology for people with visual impairment. By examining the current state of the field and designating areas for future research, we expect to contribute to the ongoing progress of improving the lives of visually impaired individuals through the use of AI-powered visual aids

    ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

    Get PDF
    Acoustic identification of vocalizing individuals opens up new and deeper insights into animal communications, such as individual-/group-specific dialects, turn-taking events, and dialogs. However, establishing an association between an individual animal and its emitted signal is usually non-trivial, especially for animals underwater. Consequently, a collection of marine species-, array-, and position-specific ground truth localization data is extremely challenging, which strongly limits possibilities to evaluate localization methods beforehand or at all. This study presents ORCA-SPY, a fully-automated sound source simulation, classification and localization framework for passive killer whale (Orcinus orca) acoustic monitoring that is embedded into PAMGuard, a widely used bioacoustic software toolkit. ORCA-SPY enables array- and position-specific multichannel audio stream generation to simulate real-world ground truth killer whale localization data and provides a hybrid sound source identification approach integrating ANIMAL-SPOT, a state-of-the-art deep learning-based orca detection network, followed by downstream Time-Difference-Of-Arrival localization. ORCA-SPY was evaluated on simulated multichannel underwater audio streams including various killer whale vocalization events within a large-scale experimental setup benefiting from previous real-world fieldwork experience. Across all 58,320 embedded vocalizing killer whale events, subject to various hydrophone array geometries, call types, distances, and noise conditions responsible for a signal-to-noise ratio varying from −14.2 dB to 3 dB, a detection rate of 94.0 % was achieved with an average localization error of 7.01∘. ORCA-SPY was field-tested on Lake Stechlin in Brandenburg Germany under laboratory conditions with a focus on localization. During the field test, 3889 localization events were observed with an average error of 29.19∘ and a median error of 17.54∘. ORCA-SPY was deployed successfully during the DeepAL fieldwork 2022 expedition (DLFW22) in Northern British Columbia, with a mean average error of 20.01∘ and a median error of 11.01∘ across 503 localization events. ORCA-SPY is an open-source and publicly available software framework, which can be adapted to various recording conditions as well as animal species
    • …
    corecore