125 research outputs found

    Dissimilarity-based multiple instance classification and dictionary learning for bioacoustic signal recognition

    Get PDF
    In this thesis, two promising and actively researched fields from pattern recognition (PR) and digital signal processing (DSP) are studied, adapted and applied for the automated recognition of bioacoustic signals: (i) learning from weakly-labeled data, and (ii) dictionary-based decomposition. The document begins with an overview of the current methods and techniques applied for the automated recognition of bioacoustic signals, and an analysis of the impact of this technology at global and local scales. This is followed by a detailed description of my research on studying two approaches from the above-mentioned fields, multiple instance learning (MIL) and dictionary learning (DL), as solutions to particular challenges in bioacoustic data analysis. The most relevant contributions and findings of this thesis are the following ones: 1) the proposal of an unsupervised recording segmentation method of audio birdsong recordings that improves species classification with the benefit of an easier implementation since no manual handling of recordings is required; 2) the confirmation that, in the analyzed audio datasets, appropriate dissimilarity measures are those which capture most of the overall differences between bags, such as the modified Hausdorff distance and the mean minimum distance; 3) the adoption of dissimilarity adaptation techniques for the enhancement of dissimilarity-based multiple instance classification, along with the potential further enhancement of the classification performance by building dissimilarity spaces and increasing training set sizes; 4) the proposal of a framework for solving MIL problems by using the one nearest neighbor (1-NN) classifier; 5) a novel convolutive DL method for learning a representative dictionary from a collection of multiple-bird audio recordings; 6) such a DL method is successfully applied to spectrogram denoising and species classification; and, 7) an efficient online version of the DL method that outperforms other state-of-the-art batch and online methods, in both, computational cost and quality of the discovered patternsResumen : En esta tesis se estudian, adaptan y aplican dos prometedoras y activas áreas del reconocimiento de patrones (PR) y procesamiento digital de señales (DSP): (i) aprendizaje débilmente supervisado y (ii) descomposiciones basadas en diccionarios. Inicialmente se hace una revisión de los métodos y técnicas que actualmente se aplican en tareas de reconocimiento automatizado de señales bioacústicas y se describe el impacto de esta tecnología a escalas nacional y global. Posteriormente, la investigación se enfoca en el estudio de dos técnicas de las áreas antes mencionadas, aprendizaje multi-instancia (MIL) y aprendizaje de diccionarios (DL), como soluciones a retos particulares del análisis de datos bioacústicos. Las contribuciones y hallazgos ms relevantes de esta tesis son los siguientes: 1) se propone un método de segmentacin de grabaciones de audio que mejora la clasificación automatizada de especies, el cual es fácil de implementar ya que no necesita información supervisada de entrenamiento; 2) se confirma que, en los conjuntos de datos analizados, las medidas de disimilitudes que capturan las diferencias globales entre bolsas funcionan apropiadamente, tales como la distancia modificada de Hausdorff y la distancia media de los mínimos; 3) la adopción de técnicas de adaptación de disimilitudes para mejorar la clasificación multi-instancia, junto con el incremento potencial del desempeño por medio de la construcción de espacios de disimilitudes y el aumento del tamaño de los conjuntos de entrenamiento; 4) se presenta un esquema para la solución de problemas MIL por medio del clasificador del vecino ms cercano (1-NN); 5) se propone un método novedoso de DL, basado en convoluciones, para el aprendizaje automatizado de un diccionario representativo a partir de un conjunto de grabaciones de audio de múltiples vocalizaciones de aves; 6) dicho mtodo DL se utiliza exitosamente como técnica de reducción de ruido en espectrogramas y clasificación de grabaciones bioacústicas; y 7) un método DL, de procesamiento en línea, que supera otros métodos del estado del arte en costo computacional y calidad de los patrones descubiertosDoctorad

    Audio Event Detection using Weakly Labeled Data

    Full text link
    Acoustic event detection is essential for content analysis and description of multimedia recordings. The majority of current literature on the topic learns the detectors through fully-supervised techniques employing strongly labeled data. However, the labels available for majority of multimedia data are generally weak and do not provide sufficient detail for such methods to be employed. In this paper we propose a framework for learning acoustic event detectors using only weakly labeled data. We first show that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem. We then suggest two frameworks for solving multiple-instance learning, one based on support vector machines, and the other on neural networks. The proposed methods can help in removing the time consuming and expensive process of manually annotating data to facilitate fully supervised learning. Moreover, it can not only detect events in a recording but can also provide temporal locations of events in the recording. This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.Comment: ACM Multimedia 201

    Automatic detection and classi cation of bird sounds in low-resource wildlife audio datasets

    Get PDF
    PhDThere are many potential applications of automatic species detection and classifi cation of birds from their sounds (e.g. ecological research, biodiversity monitoring, archival). However, acquiring adequately labelled large-scale and longitudinal data remains a major challenge, especially for species-rich remote areas as well as taxa that require expert input for identi fication. So far, monitoring of avian populations has been performed via manual surveying, sometimes even including the help of volunteers due to the challenging scales of the data. In recent decades, there is an increasing amount of ecological audio datasets that have tags assigned to them to indicate the presence or not of a specific c bird species. However, automated species vocalization detection and identifi cation is a challenging task. There is a high diversity of animal vocalisations, both in the types of the basic syllables and in the way they are combined. Also, there is noise present in most habitats, and many bird communities contain multiple bird species that can potentially have overlapping vocalisations. In recent years, machine learning has experienced a strong growth, due to increased dataset sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings. However, in training a deep learning system to perform automatic detection and audio tagging of wildlife bird sound scenes, two problems often arise. Firstly, even with the increased amount of audio datasets, most publicly available datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, in practice it is difficult to collect enough samples for most classes of interest. These problems are particularly pressing for wildlife audio but also occur in many other scenarios. In this thesis, we investigate and propose methods to perform audio event detection and classi fication on wildlife bird sound scenes and other low-resource audio datasets, such as methods based on image processing and deep learning. We extend deep learning methods for weakly labelled data in a multi-instance learning and multi task learning setting. We evaluate these methods for simultaneously detecting and classifying large numbers of sound types in audio recorded in the wild and other low resource audio datasets

    Automatic bird species identification employing an unsupervised discovery of vocalisation units

    Get PDF
    An automatic analysis of bird vocalisations for the identification of bird species, the study of their behaviour and their means of communication is important for a better understanding of the environment in which we are living and in the context of environmental protection. The high variability of vocalisations within different individuals makes species’ identification challenging for bird surveyors. Hence, the availability of a reliable automatic bird identification system through their vocalisations, would be of great interest to professionals and amateurs alike. A part of this thesis provides a biological survey on the scientific theories of the study of bird vocalisation and corresponding singing behaviours. Another section of this thesis aims to discover a set of element patterns produced by each bird species in a large corpus of the natural field recordings. Also this thesis aims to develop an automatic system for the identification of bird species from recordings. Two HMM based recognition systems are presented in this research. Evaluations have been demonstrated where the proposed element based HMM system obtained a recognition accuracy of over 93% by using 3 seconds of detected signal and over 39% recognition error rate reduction, compared to the baseline HMM system of the same complexity

    Joint Detection and Classification Convolutional Neural Network on Weakly Labelled Bird Audio Detection

    Get PDF
    Bird audio detection (BAD) aims to detect whether there is a bird call in an audio recording or not. One difficulty of this task is that the bird sound datasets are weakly labelled, that is only the presence or absence of a bird in a recording is known, without knowing when the birds call. We propose to apply joint detection and classification (JDC) model on the weakly labelled data (WLD) to detect and classify an audio clip at the same time. First, we apply VGG like convolutional neural network (CNN) on mel spectrogram as baseline. Then we propose a JDC-CNN model with VGG as a classifier and CNN as a detector. We report the denoising method including optimally-modified log-spectral amplitude (OM-LSA), median filter and spectral spectrogram will worse the classification accuracy on the contrary to previous work. JDC-CNN can predict the time stamps of the events from weakly labelled data, so is able to do sound event detection from WLD. We obtained area under curve (AUC) of 95.70% on the development data and 81.36% on the unseen evaluation data, which is nearly comparable to the baseline CNN model
    • …
    corecore