1,280 research outputs found

    A biologically inspired recurrent neural network for sound source recognition incorporating auditory attention

    Get PDF
    In this paper, a human-mimicking model for sound source recognition is presented. It consists of an artificial neural network with three neuron layers (input, middle and output) that are connected by feedback connections between the output and middle layer, on top of feedforward connections from the input to middle and middle to output layers. Learning is accomplished by the model following the Hebb principle, dictating that " cells that fire together, wire together", with some important alterations, compared to standard Hebbian learning, in order to prevent the model from forgetting previously learned patterns, when learning new ones. In addition, short-term memory is introduced into the model in order to facilitate and guide learning of neuronal synapses (long-term memory). As auditory attention is an essential part of human auditory scene analysis (ASA), it is also indispensable in any computational model mimicking it, and it is shown that different auditory attention mechanism naturally emerge from the neuronal behaviour as implemented in the model described in this paper. The learning behavior of the model is further investigated in the context of an urban sonic environment, and the importance of shortterm memory in this process is demonstrated. Finally, the effectiveness of the model is evaluated by comparing model output on presented sound recordings to a human expert listeners evaluation of the same fragments

    Long-term learning behavior in a recurrent neural network for sound recognition

    Get PDF
    In this paper, the long-term learning properties of an artificial neural network model, designed for sound recognition and computational auditory scene analysis in general, are investigated. The model is designed to run for long periods of time (weeks to months) on low-cost hardware, used in a noise monitoring network, and builds upon previous work by the same authors. It consists of three neural layers, connected to each other by feedforward and feedback excitatory connections. It is shown that the different mechanisms that drive auditory attention emerge naturally from the way in which neural activation and intra-layer inhibitory connections are implemented in the model. Training of the artificial neural network is done following the Hebb principle, dictating that "Cells that fire together, wire together", with some important modifications, compared to standard Hebbian learning. As the model is designed to be on-line for extended periods of time, also learning mechanisms need to be adapted to this. The learning needs to be strongly attention-and saliency-driven, in order not to waste available memory space for sounds that are of no interest to the human listener. The model also implements plasticity, in order to deal with new or changing input over time, without catastrophically forgetting what it already learned. On top of that, it is shown that also the implementation of shortterm memory plays an important role in the long-term learning properties of the model. The above properties are investigated and demonstrated by training on real urban sound recordings

    Data mining on urban sound sensor networks

    Get PDF
    ICA 2016, 22nd International Congress on Acoustics, BUENOS AIRES, ARGENTINE, 05-/09/2016 - 09/09/2016Urban sound sensor networks deliver megabytes of data on a daily basis so the question on how to extract useful knowledge from this overwhelming dataset is eminent. This paper presents and compares two extremely different approaches. The first approach uses as much as possible expert knowledge on how people perceive the sonic environment, the second approach simply considers the spectra obtained every time step as meaningless numbers yet tries to structure them in a meaningful way. The approach based on expert knowledge starts by extracting features that a human listener might use to detect salient sounds and to recognize these sounds. These features are then fed to a recurrent neural network that learns in an unsupervised way to structure and group these features based on co-occurrence and typical sequences. The network is constructed to mimic human auditory processing and includes inhibition and adaptation processes. The outcome of this network is the activation of a set of several hundred neurons. The second approach collects a sequence of one minute of sound spectra (1/8 second time step) and summarizes it using Gaussian mixture models in the frequency-amplitude space. Mean and standard deviation of the set of Gaussians are used for further analysis. In both cases, the outcome is clustered to analyze similarities over space and time as well as to detect outliers. Both approaches are applied on a dataset obtained from 25 measurement nodes during approximately one and a half year in Paris, France. Although the approach based on human listening models is expected to be much more precise when it comes to analyzing and clustering soundscapes, it is also much slower than the blind data analysis
    • …
    corecore