18 research outputs found

    Automatic recognition of bird species by their sounds

    Get PDF
    Lintujen äänet jaetaan niiden tehtävän perusteella lauluihin ja kutsuääniin, jotka edelleen jaetaan hierarkisen tason perusteella virkkeisiin, tavuihin ja elementteihin. Näistä tavu on sopiva yksikkö lajitunnistukseen. Erityyppisten äänten kirjo linnuilla on laaja. Tässä työssä keskitytään ääniin, jotka määritellään epäharmonisiksi. Tässä työssä käytettävä lintulajien automaattinen tunnistusjärjestelmä sisältää seuraavat vaiheet: tavujen segmentointi, piirteiden irrotus sekä luokittelijan opetus ja arviointi. Kaikki lajitunnistuskokeilut perustuvat tavujen parametriseen esitykseen käyttäen 19:ta matalan tason äänisignaalin parametria. Tunnistuskokeet toteutettiin kuudella lajilla, jotka tuottavat usein epäharmonisia ääniä. Tulosten perusteella piirteet, jotka liittyvät äänten taajuuskaistaan ja -sisältöön luokittelevat hyvin nämä äänet.Bird sounds are divided by their function into songs and calls which are further divided into hierarchical levels of phrases, syllables and elements. It is shown that syllable is suitable unit for recognition of bird species. Diversity within different types of syllables birds are able to produce is large. In this thesis main focus is sounds that are defined inharmonic. Automatic recognition system for bird species used in this thesis consist of segmentation of syllables, feature generation, classifier design and classifier evaluation phases. Recognition experinments are based on parametric representation of syllables using a total of 19 low level acoustical signal parameters. Simulation experinments were executed with six species that regularly produce inharmonic sounds. Results shows that features related to the frequency band and content of the sound provide good discrimination ability within these sounds

    Detecting Bat Calls from Audio Recordings

    Get PDF
    Bat monitoring is commonly based on audio analysis. By collecting audio recordings from large areas and analysing their content, it is possible estimate distributions of bat species and changes in them. It is easy to collect a large amount of audio recordings by leaving automatic recording units in nature and collecting them later. However, it takes a lot of time and effort to analyse these recordings. Because of that, there is a great need for automatic tools. We developed a program for detecting bat calls automatically from audio recordings. The program is designed for recordings that are collected from Finland with the AudioMoth recording device. Our method is based on a median clipping method that has previously shown promising results in the field of bird song detection. We add several modifications to the basic method in order to make it work well for our purpose. We use real-world field recordings that we have annotated to evaluate the performance of the detector and compare it to two other freely available programs (Kaleidoscope and Bat Detective). Our method showed good results and got the best F2-score in the comparison

    Algorithmic Analysis of Complex Audio Scenes

    Get PDF
    In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natürliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf Lautäußerungen basierendes Monitoring von Tierarten in Zielregionen durchzuführen. Diese Aufgabenstellung, die häufig in der Evaluation von Naturschutzmaßnahmen auftritt, führt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen für Tierstimmen entwickeln zu können, ist die Verfügbarkeit großer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt Universität Berlin. Obwohl eine große Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstützende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten für viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten Geräuschs zu annotieren. Darüber hinaus ist es nicht möglich einem Beispiel ähnlich klingende Geräusche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ähnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale für die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug für die Ähnlichkeitssuche macht. Eine der grundlegenden Quellen von Komplexität in natürlichen Audioszenen, und das schwierigste Problem für die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver Geräuschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem Maß für die Komplexität von Audioaufnahmen beruht. Schließlich führen wir Mustererkennungs-Algorithmen für die Lautäußerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus Gründen des Naturschutzes interessant, während eine Art als Prototyp für Singvögel mit stark strukturierten Gesängen dient

    Multi-instance multi-label learning : algorithms and applications to bird bioacoustics

    Get PDF
    We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their interactions with the environment and other organisms. However, traditional observation methods are labor- intensive. Most prior work on machine learning for bird song is not applicable to real-world acoustic monitoring, because it assumes recordings contain only a single species of bird, while recordings typically contain multiple simultaneously vocalizing birds. We propose to use the multi-instance multi-label (MIML) framework in machine learning for the species classification problem, where the dataset is viewed as a collection of bags of instances paired with sets of labels. Furthermore, we formalize MIML instance annotation, where the goal is to predict instance labels while learning only from bag label sets. We develop the first MIML representation for audio, and several new algorithms for MIML instance annotation based on support vector machines or classifier chains. The proposed methods classify either the set of species present in a recording, or individual calls, while learning only from recordings paired with a set of species. This form of training data requires less human effort to obtain than individually labeled calls. These methods are successfully applied to audio collected in the field which included multiple simultaneously vocalizing species. The proposed algorithms for MIML classification are general, and are also applied to object recognition in images

    Exploring Hybridity: An investigation into the integration of instrumental and acousmatic structural strategies

    Get PDF
    The aim of this commentary, which accompanies a folio of electroacoustic/acousmatic, instrumental and mixed compositions, is to investigate the relationship of instrumental and acousmatic compositional practices and to find common integrating structural strategies. These practices are also related to the handling and organization of disparate and large amounts of sound information in both media. Multi-dimensional aural spaces are very common in both instrumental and acousmatic media when timbre becomes a dynamic and form shaping parameter. The listener may perceive the musical discourse in multi-dimensional musical spaces through multiple perceptional modes. A musical syntax of those - usually indeterminate and ambiguous - aural spaces may be achieved through a hybridization of interconnected temporal concepts, connected to motion, gesture and shape, and spatial concepts, connected to sound source and timbre. The narrative structure of the musical discourse is linked to conceptualizations of physical and conceptual musical spaces through cognitive schemas and patterns and can be approached through visual and spatial metaphors that resemble film and TV montage structures. A sound montage theory provides a basic framework for the organization of narrative structure in sound composition

    Public Engagement Technology for Bioacoustic Citizen Science

    Get PDF
    Inexpensive mobile devices offer new capabilities for non-specialist use in the field for the purpose of conservation. This thesis explores the potential for such devices to be used by citizen scientists interacting with bioacoustic data such as birdsong. This thesis describes design research and field evaluation, in collaboration with conservationists and educators, and technological artefacts implemented as mobile applications for interactive educational gaming and creative composition. This thesis considers, from a participant-centric collaborative design approach, conservationists' demand for interactive artefacts to motivate engagement in citizen science through gameful and playful interactions. Drawing on theories of motivation, frequently applied to the study of Human-Computer Interaction (HCI), and on approaches to designing for motivational engagement, this thesis introduces a novel pair of frameworks for the analysis of technological artefacts and for assessing participant engagement with bioacoustic citizen science from both game interaction design and citizen science project participation perspectives. This thesis reviews current theories of playful and gameful interaction developed for collaborative learning, data analysis, and ground-truth development, describes a process for design and analysis of motivational mobile games and toys, and explores the affordances of various game elements and mechanics for engaging participation in bioacoustic citizen science. This thesis proposes research into progressions for scaffolding engagement with citizen science projects where participants interact with data collection and analysis artefacts. The research process includes the development of multiple designs, analyses of which explore the efficacy of game interactions to motivate engagement through interaction progressions, given proposed analysis frameworks. This thesis presents analysed results of experiments examining the usability of, and data-quality from, several prototypes and software artefacts, in both laboratory conditions and the field. This thesis culminates with an assessment of the efficacy of proposed design analysis frameworks, an analysis of designed artefacts, and a discussion of how these designs increase intrinsic and extrinsic motivation for participant engagement and affect resultant bioacoustic citizen science data quantity and quality.Non

    BIOLOGICALLY-INFORMED COMPUTATIONAL MODELS OF HARMONIC SOUND DETECTION AND IDENTIFICATION

    Get PDF
    Harmonic sounds or harmonic components of sounds are often fused into a single percept by the auditory system. Although the exact neural mechanisms for harmonic sensitivity remain unclear, it arises presumably in the auditory cortex because subcortical neurons typically prefer only a single frequency. Pitch sensitive units and harmonic template units found in awake marmoset auditory cortex are sensitive to temporal and spectral periodicity, respectively. This thesis is a study of possible computational mechanisms underlying cortical harmonic selectivity. To examine whether harmonic selectivity is related to statistical regularities of natural sounds, simulated auditory nerve responses to natural sounds were used in principal component analysis in comparison with independent component analysis, which yielded harmonic-sensitive model units with similar population distribution as real cortical neurons in terms of harmonic selectivity metrics. This result suggests that the variability of cortical harmonic selectivity may provide an efficient population representation of natural sounds. Several network models of spectral selectivity mechanisms are investigated. As a side study, adding synaptic depletion to an integrate-and-fire model could explain the observed modulation-sensitive units, which are related to pitch-sensitive units but cannot account for precise temporal regularity. When a feed-forward network is trained to detect harmonics, the result is always a sieve, which is excited by integer multiples of the fundamental frequency and inhibited by half-integer multiples. The sieve persists over a wide variety of conditions including changing evaluation criteria, incorporating Dale’s principle, and adding a hidden layer. A recurrent network trained by Hebbian learning produces harmonic-selective by a novel dynamical mechanism that could be explained by a Lyapunov function which favors inputs that match the learned frequency correlations. These model neurons have sieve-like weights like the harmonic template units when probed by random harmonic stimuli, despite there being no sieve pattern anywhere in the network’s weights. Online stimulus design has the potential to facilitate future experiments on nonlinear sensory neurons. We accelerated the sound-from-texture algorithm to enable online adaptive experimental design to maximize the activities of sparsely responding cortical units. We calculated the optimal stimuli for harmonic-selective units and investigated model-based information-theoretic method for stimulus optimization

    Temporal integration of loudness as a function of level

    Get PDF

    Application of the PE method to up-slope sound propagation

    Get PDF
    corecore