98 research outputs found

    A Spiking Neural Network Model of the Medial Superior Olive Using Spike Timing Dependent Plasticity for Sound Localization

    Get PDF
    Sound localization can be defined as the ability to identify the position of an input sound source and is considered a powerful aspect of mammalian perception. For low frequency sounds, i.e., in the range 270 Hz–1.5 KHz, the mammalian auditory pathway achieves this by extracting the Interaural Time Difference between sound signals being received by the left and right ear. This processing is performed in a region of the brain known as the Medial Superior Olive (MSO). This paper presents a Spiking Neural Network (SNN) based model of the MSO. The network model is trained using the Spike Timing Dependent Plasticity learning rule using experimentally observed Head Related Transfer Function data in an adult domestic cat. The results presented demonstrate how the proposed SNN model is able to perform sound localization with an accuracy of 91.82% when an error tolerance of ±10° is used. For angular resolutions down to 2.5°, it will be demonstrated how software based simulations of the model incur significant computation times. The paper thus also addresses preliminary implementation on a Field Programmable Gate Array based hardware platform to accelerate system performance

    A Spiking Neural Network Model of the Medial Superior Olive using Spike Timing Dependent Plasticity for Sound Localisation

    Get PDF
    Sound localization can be defined as the ability to identify the position of an input sound source and is considered a powerful aspect of mammalian perception. For low frequency sounds, i.e., in the range 270 Hz-1.5 KHz, the mammalian auditory pathway achieves this by extracting the Interaural Time Difference between sound signals being received by the left and right ear. This processing is performed in a region of the brain known as the Medial Superior Olive (MSO). This paper presents a Spiking Neural Network (SNN) based model of the MSO. The network model is trained using the Spike Timing Dependent Plasticity learning rule using experimentally observed Head Related Transfer Function data in an adult domestic cat. The results presented demonstrate how the proposed SNN model is able to perform sound localization with an accuracy of 91.82% when an error tolerance of +/-10 degrees is used. For angular resolutions down to 2.5 degrees , it will be demonstrated how software based simulations of the model incur significant computation times. The paper thus also addresses preliminary implementation on a Field Programmable Gate Array based hardware platform to accelerate system performance

    Calibration of sound source localisation for robots using multiple adaptive filter models of the cerebellum

    Get PDF
    The aim of this research was to investigate the calibration of Sound Source Localisation (SSL) for robots using the adaptive filter model of the cerebellum and how this could be automatically adapted for multiple acoustic environments. The role of the cerebellum has mainly been identified in the context of motor control, and only in recent years has it been recognised that it has a wider role to play in the senses and cognition. The adaptive filter model of the cerebellum has been successfully applied to a number of robotics applications but so far none involving auditory sense. Multiple models frameworks such as MOdular Selection And Identification for Control (MOSAIC) have also been developed in the context of motor control, and this has been the inspiration for adaptation of audio calibration in multiple acoustic environments; again, application of this approach in the area of auditory sense is completely new. The thesis showed that it was possible to calibrate the output of an SSL algorithm using the adaptive filter model of the cerebellum, improving the performance compared to the uncalibrated SSL. Using an adaptation of the MOSAIC framework, and specifically using responsibility estimation, a system was developed that was able to select an appropriate set of cerebellar calibration models and to combine their outputs in proportion to how well each was able to calibrate, to improve the SSL estimate in multiple acoustic contexts, including novel contexts. The thesis also developed a responsibility predictor, also part of the MOSAIC framework, and this improved the robustness of the system to abrupt changes in context which could otherwise have resulted in a large performance error. Responsibility prediction also improved robustness to missing ground truth, which could occur in challenging environments where sensory feedback of ground truth may become impaired, which has not been addressed in the MOSAIC literature, adding to the novelty of the thesis. The utility of the so-called cerebellar chip has been further demonstrated through the development of a responsibility predictor that is based on the adaptive filter model of the cerebellum, rather than the more conventional function fitting neural network used in the literature. Lastly, it was demonstrated that the multiple cerebellar calibration architecture is capable of limited self-organising from a de-novo state, with a predetermined number of models. It was also demonstrated that the responsibility predictor could learn against its model after self-organisation, and to a limited extent, during self-organisation. The thesis addresses an important question of how a robot could improve its ability to listen in multiple, challenging acoustic environments, and recommends future work to develop this ability

    Étude des mĂ©canismes de localisation auditive et de leur plasticitĂ© dans le cortex auditif humain

    Get PDF
    Pouvoir dĂ©terminer la provenance des sons est fondamental pour bien interagir avec notre environnement. La localisation auditive est une facultĂ© importante et complexe du systĂšme auditif humain. Le cerveau doit dĂ©coder le signal acoustique pour en extraire les indices qui lui permettent de localiser une source sonore. Ces indices de localisation auditive dĂ©pendent en partie de propriĂ©tĂ©s morphologiques et environnementales qui ne peuvent ĂȘtre anticipĂ©es par l'encodage gĂ©nĂ©tique. Le traitement de ces indices doit donc ĂȘtre ajustĂ© par l'expĂ©rience durant la pĂ©riode de dĂ©veloppement. À l’ñge adulte, la plasticitĂ© en localisation auditive existe encore. Cette plasticitĂ© a Ă©tĂ© Ă©tudiĂ©e au niveau comportemental, mais on ne connaĂźt que trĂšs peu ses corrĂ©lats et mĂ©canismes neuronaux. La prĂ©sente recherche avait pour objectif d'examiner cette plasticitĂ©, ainsi que les mĂ©canismes d'encodage des indices de localisation auditive, tant sur le plan comportemental, qu'Ă  travers les corrĂ©lats neuronaux du comportement observĂ©. Dans les deux premiĂšres Ă©tudes, nous avons imposĂ© un dĂ©calage perceptif de l’espace auditif horizontal Ă  l’aide de bouchons d’oreille numĂ©riques. Nous avons montrĂ© que de jeunes adultes peuvent rapidement s’adapter Ă  un dĂ©calage perceptif important. Au moyen de l’IRM fonctionnelle haute rĂ©solution, nous avons observĂ© des changements de l’activitĂ© corticale auditive accompagnant cette adaptation, en termes de latĂ©ralisation hĂ©misphĂ©rique. Nous avons Ă©galement pu confirmer l’hypothĂšse de codage par hĂ©michamp comme reprĂ©sentation de l'espace auditif horizontal. Dans une troisiĂšme Ă©tude, nous avons modifiĂ© l’indice auditif le plus important pour la perception de l’espace vertical Ă  l’aide de moulages en silicone. Nous avons montrĂ© que l’adaptation Ă  cette modification n’était suivie d’aucun effet consĂ©cutif au retrait des moulages, mĂȘme lors de la toute premiĂšre prĂ©sentation d’un stimulus sonore. Ce rĂ©sultat concorde avec l’hypothĂšse d’un mĂ©canisme dit de many-to-one mapping, Ă  travers lequel plusieurs profils spectraux peuvent ĂȘtre associĂ©s Ă  une mĂȘme position spatiale. Dans une quatriĂšme Ă©tude, au moyen de l’IRM fonctionnelle et en tirant profit de l’adaptation aux moulages de silicone, nous avons rĂ©vĂ©lĂ© l’encodage de l’élĂ©vation sonore dans le cortex auditif humain.Spatial hearing is an important but complex capacity of the auditory system. The human auditory system infers the location of a sound source from a variety of acoustic cues, known as auditory localization cues. Because these cues depend to some degree on morphological and environmental factors that cannot be predicted by the genetic makeup, their processing has to fine-tuned during development. Even in adulthood, some plasticity in the processing of localization cues remains. This plasticity has been studied behaviorally, but very little is known about its neural correlates and mechanisms. The present research aimed to investigate this plasticity, as well as the encoding mechanisms of the auditory localization cues, using both behavioral and neuroimaging techniques. In the first two studies, we shifted the perception of horizontal auditory space using digital earplugs. We showed that young adults rapidly adapt to a large perceived shift and that adaptation is accompanied by changes in hemispheric lateralization of auditory cortex activity, as observed with high-resolution functional MRI. We also confirmed the hypothesis of a hemifield code for horizontal sound source location representation in the human auditory cortex. In a third study, we modified the major cue for vertical space perception using silicone earmolds and showed that the adaptation to this modification was not followed by any aftereffect upon earmolds removal, even at the very first sound presentation. This result is consistent with the hypothesis of a “many-to-one mapping” mechanism in which several spectral profiles can become associated with a given spatial direction. In a fourth study, using functional MRI and taking advantage of the adaptation to silicone earmolds, we revealed the encoding of sound source elevation in the human auditory cortex

    Wave Field Synthesis in a listening room

    Get PDF
    This thesis investigates the influence of the listening room on sound fields synthesised by Wave Field Synthesis. Methods are developed that allow for investigation of the spatial and timbral perception of Wave Field Synthesis in a reverberant environment using listening experiments based on simulation by binaural synthesis and room acoustical simulation. The results can serve as guidelines for the design of listening rooms for Wave Field Synthesis.Diese Dissertation untersucht den Einfluss des Wiedergaberaums auf Schallfelder, die mit Wellenfeldsynthese synthetisiert werden. Es werden Methoden zur Untersuchung von rĂ€umlicher und klangfarblicher Wahrnehmung von Wellenfeldsynthese in einer reflektierenden Umgebung mittels Hörversuchen entwickelt, die auf Simulation mit Binauralsynthese und raumakustischer Simulation beruhen. Die Ergebnisse können als Richtlinien zur Gestaltung von WiedergaberĂ€umen fĂŒr Wellenfeldsynthese dienen

    Aspects of room acoustics, vision and motion in the human auditory perception of space

    Get PDF
    The human sense of hearing contributes to the awareness of where sound-generating objects are located in space and of the environment in which the hearing individual is located. This auditory perception of space interacts in complex ways with our other senses, can be both disrupted and enhanced by sound reflections, and includes safety mechanisms which have evolved to protect our lives, but can also mislead us. This dissertation explores some selected topics from this wide subject area, mostly by testing the abilities and subjective judgments of human listeners in virtual environments. Reverberation is the gradually decaying persistence of sounds in an enclosed space which results from repeated sound reflections at surfaces. The first experiment (Chapter 2) compared how strongly people perceived reverberation in different visual situations: when they could see the room and the source which generated the sound; when they could see some room and some sound source, but the image did not match what they heard; and when they could not see anything at all. There were no indications that the visual image had any influence on this aspect of room-acoustical perception. The potential benefits of motion for judging the distance of sound sources were the focus of the second study (Chapter 3), which consists of two parts. In the first part, loudspeakers were placed at different depths in front of sitting listeners who, on command, had to either remain still or move their upper bodies sideways. This experiment demonstrated that humans can exploit motion parallax (the effect that closer objects appear faster to a moving observer than farther objects) with their ears and not just with their eyes. The second part combined a virtualisation of such sound sources with a motion platform to show that the listeners’ interpretation of this auditory motion parallax was better when they performed this lateral movement by themselves, rather than when they were moved by the apparatus or were not actually in motion at all. Two more experiments were concerned with the perception of sounds which are perceived as becoming louder over time. These have been called “looming”, as the source of such a sound might be on a collision course. One of the studies (Chapter 4) showed that western diamondback rattlesnakes (Crotalus atrox) increase the vibration speed of their rattle in response to the approach of a threatening object. It also demonstrated that human listeners perceive (virtual) snakes which engage in this behaviour as especially close, causing them to keep a greater margin of safety than they would otherwise. The other study (section 5.6) was concerned with the well-known looming bias of the sound localisation system, a phenomenon which leads to a sometimes exaggerated, sometimes more accurate perception of approaching compared to receding sounds. It attempted to find out whether this bias is affected by whether listeners hear such sounds in a virtual enclosed space or in an environment with no sound reflections. While the results were inconclusive, this experiment is noteworthy as a proof of concept: It was the first study to make use of a new real-time room-acoustical simulation system, liveRAZR, which was developed as part of this dissertation (Chapter 5). Finally, while humans have been more often studied for their unique abilities to communicate with each other and bats for their extraordinary capacity to locate objects by sound, this dissertation turns this setting of priorities on its head with the last paper (Chapter 6): Based on recordings of six pale spear-nosed bats (Phyllostomus discolor), it is a survey of the identifiably distinct vocalisations observed in their social interactions, along with a description of the different situations in which they typically occur.Das menschliche Gehör trĂ€gt zum Bewusstsein dafĂŒr bei, wo sich schallerzeugende Objekte im Raum befinden und wie die Umgebung beschaffen ist, in der sich eine Person aufhĂ€lt. Diese auditorische Raumwahrnehmung interagiert auf komplexe Art und Weise mit unseren anderen Sinnen, kann von Schallreflektionen sowohl profitieren als auch durch sie behindert werden, und besitzt Mechanismen welche evolutionĂ€r entstanden sind, um unser Leben zu schĂŒtzen, uns aber auch irrefĂŒhren können. Diese Dissertation befasst sich mit einigen ausgewĂ€hlten Themen aus diesem weiten Feld und stĂŒtzt sich dabei meist auf die Testung von WahrnehmungsfĂ€higkeiten und subjektiver EinschĂ€tzungen menschlicher Hörer/-innen in virtueller RealitĂ€t. Beim ersten Experiment (Kapitel 2) handelte es sich um einen Vergleich zwischen der Wahrnehmung von Nachhall, dem durch wiederholte Reflexionen an OberflĂ€chen hervorgerufenen, sukzessiv abschwellenden Verbleib von Schall in einem umschlossenen Raum, unter verschiedenen visuellen UmstĂ€nden: wenn die Versuchsperson den Raum und die Schallquelle sehen konnte; wenn sie irgendeinen Raum und irgendeine Schallquelle sehen konnte, dieses Bild aber vom Schalleindruck abwich; und wenn sie gar kein Bild sehen konnte. Dieser Versuch konnte keinen Einfluss eines Seheindrucks auf diesen Aspekt der raumakustischen Wahrnehmung zu Tage fördern. Mögliche Vorteile von Bewegung fĂŒr die EinschĂ€tzung der Entfernung von Schallquellen waren der Schwerpunkt der zweiten Studie (Kapitel 3). Diese bestand aus zwei Teilen, wovon der erste zeigte, dass Hörer/-innen, die ihren Oberkörper relativ zu zwei in unterschiedlichen AbstĂ€nden vor ihnen aufgestellten Lautsprechern auf Kommando entweder stillhalten oder seitlich bewegen mussten, im letzteren Falle von der Bewegungsparallaxe (dem Effekt, dass sich der nĂ€here Lautsprecher relativ zum sich bewegenden Körper schneller bewegte als der weiter entfernte) profitieren konnten. Der zweite Teil kombinierte eine Simulation solcher Schallquellen mit einer Bewegungsplattform, wodurch gezeigt werden konnte, dass die bewusste Eigenbewegung fĂŒr die Versuchspersonen hilfreicher war, als durch die Plattform bewegt zu werden oder gar nicht wirklich in Bewegung zu sein. Zwei weitere Versuche gingen auf die Wahrnehmung von Schallen ein, deren Ursprungsort sich nach und nach nĂ€her an den/die Hörer/-in heranbewegte. Derartige Schalle werden auch als „looming“ („anbahnend“) bezeichnet, da eine solche AnnĂ€herung bei bedrohlichen Signalen nichts Gutes ahnen lĂ€sst. Einer dieser Versuche (Kapitel 4) zeigte zunĂ€chst, dass Texas-Klapperschlangen (Crotalus atrox) die Vibrationsgeschwindigkeit der Schwanzrassel steigern, wenn sich ein bedrohliches Objekt ihnen nĂ€hert. Menschliche Hörer/-innen nahmen (virtuelle) Schlangen, die dieses Verhalten aufweisen, als besonders nahe wahr und hielten einen grĂ¶ĂŸeren Sicherheitsabstand ein, als sie es sonst tun wĂŒrden. Der andere Versuch (Abschnitt 5.6) versuchte festzustellen, ob die wohlbekannte Neigung unserer Schallwahrnehmung, nĂ€herkommende Schalle manchmal ĂŒbertrieben und manchmal genauer einzuschĂ€tzen als sich entfernende, durch Schallreflektionen beeinflusst werden kann. Diese Ergebnisse waren unschlĂŒssig, jedoch bestand die Besonderheit dieses Versuchs darin, dass er erstmals ein neues Echtzeitsystem zur Raumakustiksimulation (liveRAZR) nutzte, welches als Teil dieser Dissertation entwickelt wurde (Kapitel 5). Abschließend (Kapitel 6) wird die Schwerpunktsetzung auf den Kopf gestellt, nach der Menschen öfter auf ihre einmaligen FĂ€higkeiten zur Kommunikation miteinander untersucht werden und FledermĂ€use öfter auf ihre außergewöhnliches Geschick, Objekte durch Schall zu orten: Anhand von Aufnahmen von sechs Kleinen Lanzennasen (Phyllostomus discolor) fasst das Kapitel die klar voneinander unterscheidbaren Laute zusammen, die diese Tiere im sozialen Umgang miteinander produzieren, und beschreibt, in welchen Situationen diese Lauttypen typischerweise auftreten

    Using auditory augmented reality to understand visual scenes

    Get PDF
    Locating objects in space is typically thought of as a visual task. However, not everyone has access to visual information, such as the blind. The purpose of this thesis was to investigate whether it was possible to convert visual events into spatial auditory cues. A neuromorphic retina was used to collect visual events and custom software was written to augment auditory localization cues into the scene. The neuromorphic retina is engineered to encode data similar to how the dorsal visual pathway does. The dorsal visual pathway is associated with fast nonredundant information encoding and is thought to drive attentional shifting, especially in the presence of visual transients. The intent was to create a device capable of using these visual onsets and transients to generate spatial auditory cues. To achieve this, the device uses the core principles driving auditory localization, with a focus on the interaural time and level difference cues. These cues are thought to be responsible for encoding azimuthal location in space. Results demonstrate the usefulness of such a device, but personalization will probably improve the effectiveness of the cues generated. In summary, I have created a device that converts purely visual events into useful auditory cues for localization, thereby granting perception of stimuli that may have been inaccessible to the user

    Auditory fitness for duty: localising small arms gunfire

    No full text
    Locating the source of small arms fire is deemed a mission-critical auditory task by infantry personnel (Bevis et al. 2014; Semeraro et al. 2015). Little is known about the acoustic localisation cues within a gunshot and human ability to localise gunshots. Binaural recordings of ‘live’ gunshots from an SA80 rifle were obtained using a KEMAR dummy head placed 100 m from the firer, within 30 cm of the bullet trajectory and with 13 azimuth angles from 90° left to 90Âș right. The ‘crack’, created by the supersonic bullet passing the target, produced smaller interaural time and level differences than the ‘thump’, created by the muzzle blast, for the rifle at the same angle. Forty normal-hearing listeners (20 civilian, 20 military personnel) and 12 hearing impaired listeners (all military personnel) completed a virtual azimuthal localisation task using three stimuli created from the recordings (whole gunshot, ‘crack’ only and ‘thump’ only) plus a 50 ms broadband noise burst convolved with KEMAR impulse responses. All listeners localised all stimuli types above chance level. Average localisation error increased in the order of: noise burst < thump < gunshot < crack, for all cohorts. Military personnel (regardless of their hearing level) performed significantly worse than civilians for all stimuli; they had a higher tendency to select the extreme left and right sources, resulting in an increased lateral bias. The difference between military and civilian participants may be due to their understanding of the task or military training/experience. Mild to moderate bilateral symmetrical sensorineural hearing loss did not have a significant impact on localisation accuracy. This suggests that, providing the gunshot is clearly audible and audiometric thresholds are equal between the ears, binaural cues will still be accessible and localisation accuracy will be preserved. Further work is recommended to investigate the relationship between other hearing loss configurations and small arms gunshot localisation accuracy before considering gunshot localisation as a measure of auditory fitness for infantry personnel

    Sensitivity to interaural timing differences within the envelopes of acoustic waveforms

    Get PDF
    Interaural-timing-differences (ITDs) are a cue for sound-source localisation and can be conveyed in the temporal-fine-structure (TFS) of low-frequency tones or in the envelope of high-frequency, amplitude-modulated sounds such as sinusoidally amplitude-modulated (SAM) and transposed-tones. Sensitivity to these cues has been measured in human psychophysical experiments and has revealed that the tranposed-tone elicits just-noticeable-differences (JNDs) in ITDs that are equivalent to those of low-frequency pure-tones when the modulation frequency is below 512-Hz. At modulation frequencies above 512-Hz performance rapidly declines for the transposed-tone while sensitivity to ITDs in pure-tones is robust until around 1200-Hz. Furthermore, transposed-tones elicit JNDs smaller than SAM tones. In the present study, ITD JNDs are assessed psychophysically for pure-tones and transposed-tones using off-midline reference locations. The results demonstrate that frequency, whether the ITD is conveyed in the TFS or the envelope, and location, all have a significant effect on human ITD JNDs and suggest that a difference exists in how ITDs are coded neuronally when conveyed by either high- or low-frequency sounds. ITD-sensitive neurons located within several brainstem nuclei display a high degree of phase-locking to both the TFS of low-frequency pure-tones and the envelopes of SAM and transposed-tones. Echoing the psychophysical findings, phase-locking to the waveform envelope at low modulation frequencies is equivalent to that of low-frequency pure-tones, while declining at high rates of modulation to a lesser degree for tranposed-tones than SAM tones. In order to assess factors critical to the localisation of high-frequency sounds a series of electrophysiology experiments were conducted. Recordings were made from single neurons within the inferior colliculus of the guinea pig in response to ITDs conveyed by 18 unique envelope shapes to evaluate how the envelope segments; Pause, Attack, Sustain and Decay each effect ITD JNDs. Amplitude-modulations with envelope shapes comprising relatively long Pause but short Attack durations have been found to elicit the greatest ITD discrimination of high-frequency sounds

    Role of The Cochlea and Efferent System in Children with Auditory Processing Disorder

    Get PDF
    Auditory processing disorder (APD) is characterized by difficulty listening in noisy environments despite normal hearing thresholds. APD was previously thought to be restricted to deficits in the central auditory system. The current work sought to investigate brainstem and peripheral mechanisms that may contribute to difficulties in speech understanding in noise in children with suspected APD (sAPD). Three mechanisms in particular were investigated: cochlear tuning, efferent function, and spatial hearing. Cochlear tuning was measured using stimulus frequency otoacoustic emission (SFOAE) group delay. Results indicate that children suspected with APD have atypically sharp cochlear tuning, and reduced medial olivocochlear (MOC) functioning. Sharper-than-typical cochlear tuning may lead to increased forward masking. On the contrary, binaural efferent function probed with a forward masked click evoked OAE (CEOAE) paradigm indicated that MOC function was not different in typically developing (TD) children and children suspected with APD. A third study with multiple OAE types sought to address this contradiction. Despite numerically smaller MOC inhibition in the sAPD group, MOC function was not significantly different between the two groups. Finally, spatial release from masking, localization-in-noise and interaural time difference thresholds were compared in TD and children with sAPD. Results indicate no significant difference in spatial hearing abilities between the two groups. Non-significant findings at group level in these studies may be related to the large heterogeneity in problems associated with APD. Fragmentation of APD into deficit specific disorders may facilitate research in identification of the specific anatomical underpinnings to listening problems in APD. Prior to conducting studies in children, three studies were conducted to optimize stimulus characteristics. Results of these studies indicate that the MOC may not be especially sensitive to 100 Hz amplitude modulation, as previously reported. Click stimulus presentation rates \u3e25 Hz activate the ipsilateral MOC reflex in typical MOC assays, contaminating contralateral MOC inhibition of CEOAEs. Finally, localization-in-noise abilities of TD children are on par with adults for a white noise masker, but not for speech-babble. This finding suggests that despite maturation of physiological mechanisms required to localize in noise, non-auditory factors may restrict the ability of children in processing complex signals
    • 

    corecore