1,683 research outputs found

    A comparison of feedback cues for enhancing pointing efficiency in interaction with spatial audio displays

    Get PDF
    An empirical study that compared six different feedback cue types to enhance pointing efficiency in deictic spatial audio displays is presented. Participants were asked to select a sound using a physical pointing gesture, with the help of a loudness cue, a timbre cue and an orientation update cue as well as with combinations of these cues. Display content was varied systematically to investigate the effect of increasing display population. Speed, accuracy and throughput ratings are provided as well as effective target widths that allow for minimal error rates. The results showed direct pointing to be the most efficient interaction technique; however large effective target widths reduce the applicability of this technique. Movement-coupled cues were found to significantly reduce display element size, but resulted in slower interaction and were affected by display content due to the requirement of continuous target attainment. The results show that, with appropriate design, it is possible to overcome interaction uncertainty and provide solutions that are effective in mobile human computer interaction

    Technical aspects of a demonstration tape for three-dimensional sound displays

    Get PDF
    This document was developed to accompany an audio cassette that demonstrates work in three-dimensional auditory displays, developed at the Ames Research Center Aerospace Human Factors Division. It provides a text version of the audio material, and covers the theoretical and technical issues of spatial auditory displays in greater depth than on the cassette. The technical procedures used in the production of the audio demonstration are documented, including the methods for simulating rotorcraft radio communication, synthesizing auditory icons, and using the Convolvotron, a real-time spatialization device

    Spatial Hearing with Simultaneous Sound Sources: A Psychophysical Investigation

    Get PDF
    This thesis provides an overview of work conducted to investigate human spatial hearing in situations involving multiple concurrent sound sources. Much is known about spatial hearing with single sound sources, including the acoustic cues to source location and the accuracy of localisation under different conditions. However, more recently interest has grown in the behaviour of listeners in more complex environments. Concurrent sound sources pose a particularly difficult problem for the auditory system, as their identities and locations must be extracted from a common set of sensory receptors and shared computational machinery. It is clear that humans have a rich perception of their auditory world, but just how concurrent sounds are processed, and how accurately, are issues that are poorly understood. This work attempts to fill a gap in our understanding by systematically examining spatial resolution with multiple sound sources. A series of psychophysical experiments was conducted on listeners with normal hearing to measure performance in spatial localisation and discrimination tasks involving more than one source. The general approach was to present sources that overlapped in both frequency and time in order to observe performance in the most challenging of situations. Furthermore, the role of two primary sets of location cues in concurrent source listening was probed by examining performance in different spatial dimensions. The binaural cues arise due to the separation of the two ears, and provide information about the lateral position of sound sources. The spectral cues result from location-dependent filtering by the head and pinnae, and allow vertical and front-rear auditory discrimination. Two sets of experiments are described that employed relatively simple broadband noise stimuli. In the first of these, two-point discrimination thresholds were measured using simultaneous noise bursts. It was found that the pair could be resolved only if a binaural difference was present; spectral cues did not appear to be sufficient. In the second set of experiments, the two stimuli were made distinguishable on the basis of their temporal envelopes, and the localisation of a designated target source was directly examined. Remarkably robust localisation was observed, despite the simultaneous masker, and both binaural and spectral cues appeared to be of use in this case. Small but persistent errors were observed, which in the lateral dimension represented a systematic shift away from the location of the masker. The errors can be explained by interference in the processing of the different location cues. Overall these experiments demonstrated that the spatial perception of concurrent sound sources is highly dependent on stimulus characteristics and configurations. This suggests that the underlying spatial representations are limited by the accuracy with which acoustic spatial cues can be extracted from a mixed signal. Three sets of experiments are then described that examined spatial performance with speech, a complex natural sound. The first measured how well speech is localised in isolation. This work demonstrated that speech contains high-frequency energy that is essential for accurate three-dimensional localisation. In the second set of experiments, spatial resolution for concurrent monosyllabic words was examined using similar approaches to those used for the concurrent noise experiments. It was found that resolution for concurrent speech stimuli was similar to resolution for concurrent noise stimuli. Importantly, listeners were limited in their ability to concurrently process the location-dependent spectral cues associated with two brief speech sources. In the final set of experiments, the role of spatial hearing was examined in a more relevant setting containing concurrent streams of sentence speech. It has long been known that binaural differences can aid segregation and enhance selective attention in such situations. The results presented here confirmed this finding and extended it to show that the spectral cues associated with different locations can also contribute. As a whole, this work provides an in-depth examination of spatial performance in concurrent source situations and delineates some of the limitations of this process. In general, spatial accuracy with concurrent sources is poorer than with single sound sources, as both binaural and spectral cues are subject to interference. Nonetheless, binaural cues are quite robust for representing concurrent source locations, and spectral cues can enhance spatial listening in many situations. The findings also highlight the intricate relationship that exists between spatial hearing, auditory object processing, and the allocation of attention in complex environments

    Spatial Hearing with Simultaneous Sound Sources: A Psychophysical Investigation

    Get PDF
    This thesis provides an overview of work conducted to investigate human spatial hearing in situations involving multiple concurrent sound sources. Much is known about spatial hearing with single sound sources, including the acoustic cues to source location and the accuracy of localisation under different conditions. However, more recently interest has grown in the behaviour of listeners in more complex environments. Concurrent sound sources pose a particularly difficult problem for the auditory system, as their identities and locations must be extracted from a common set of sensory receptors and shared computational machinery. It is clear that humans have a rich perception of their auditory world, but just how concurrent sounds are processed, and how accurately, are issues that are poorly understood. This work attempts to fill a gap in our understanding by systematically examining spatial resolution with multiple sound sources. A series of psychophysical experiments was conducted on listeners with normal hearing to measure performance in spatial localisation and discrimination tasks involving more than one source. The general approach was to present sources that overlapped in both frequency and time in order to observe performance in the most challenging of situations. Furthermore, the role of two primary sets of location cues in concurrent source listening was probed by examining performance in different spatial dimensions. The binaural cues arise due to the separation of the two ears, and provide information about the lateral position of sound sources. The spectral cues result from location-dependent filtering by the head and pinnae, and allow vertical and front-rear auditory discrimination. Two sets of experiments are described that employed relatively simple broadband noise stimuli. In the first of these, two-point discrimination thresholds were measured using simultaneous noise bursts. It was found that the pair could be resolved only if a binaural difference was present; spectral cues did not appear to be sufficient. In the second set of experiments, the two stimuli were made distinguishable on the basis of their temporal envelopes, and the localisation of a designated target source was directly examined. Remarkably robust localisation was observed, despite the simultaneous masker, and both binaural and spectral cues appeared to be of use in this case. Small but persistent errors were observed, which in the lateral dimension represented a systematic shift away from the location of the masker. The errors can be explained by interference in the processing of the different location cues. Overall these experiments demonstrated that the spatial perception of concurrent sound sources is highly dependent on stimulus characteristics and configurations. This suggests that the underlying spatial representations are limited by the accuracy with which acoustic spatial cues can be extracted from a mixed signal. Three sets of experiments are then described that examined spatial performance with speech, a complex natural sound. The first measured how well speech is localised in isolation. This work demonstrated that speech contains high-frequency energy that is essential for accurate three-dimensional localisation. In the second set of experiments, spatial resolution for concurrent monosyllabic words was examined using similar approaches to those used for the concurrent noise experiments. It was found that resolution for concurrent speech stimuli was similar to resolution for concurrent noise stimuli. Importantly, listeners were limited in their ability to concurrently process the location-dependent spectral cues associated with two brief speech sources. In the final set of experiments, the role of spatial hearing was examined in a more relevant setting containing concurrent streams of sentence speech. It has long been known that binaural differences can aid segregation and enhance selective attention in such situations. The results presented here confirmed this finding and extended it to show that the spectral cues associated with different locations can also contribute. As a whole, this work provides an in-depth examination of spatial performance in concurrent source situations and delineates some of the limitations of this process. In general, spatial accuracy with concurrent sources is poorer than with single sound sources, as both binaural and spectral cues are subject to interference. Nonetheless, binaural cues are quite robust for representing concurrent source locations, and spectral cues can enhance spatial listening in many situations. The findings also highlight the intricate relationship that exists between spatial hearing, auditory object processing, and the allocation of attention in complex environments

    Speech Perception in Virtual Environments

    Get PDF
    Many virtual environments like interactive computer games, educational software or training simulations make use of speech to convey important information to the user. These applications typically present a combination of background music, sound effects, ambient sounds and dialog simultaneously to create a rich auditory environment. Since interactive virtual environments allow users to roam freely among different sound producing objects, sound designers do not always have exact control over what sounds a user will perceive at any given time. This dissertation investigates factors that influence the perception of speech in virtual environments under adverse listening conditions. A virtual environment was created to study hearing performance under different audio-visual conditions. The two main areas of investigation were the contribution of "spatial unmasking" and lip animation to speech perception. Spatial unmasking refers to the hearing benefit achieved when the target sound and masking sound are presented from different locations. Both auditory and visual factors influencing speech perception were considered. The capability of modern sound hardware to produce a spatial release from masking using real-time 3D sound spatialization was compared with the pre-computed method of creating spatialized sound. It was found that spatial unmasking could be achieved when using a modern consumer 3D sound card and either a headphone or surround sound speaker display. Surprisingly, masking was less effective when using real-time sound spatialization and subjects achieved better hearing performance than when the pre-computed method was used. Most research on the spatial unmasking of speech has been conducted in pure auditory environments. The influence of an additional visual cue was first investigated to determine whether this provided any benefit. No difference in hearing performance was observed when visible objects were presented at the same location as the auditory stimuli. Because of inherent limitations of display devices, the auditory and visual environments are often not perfectly aligned, causing a sound-producing object to be seen at a different location from where it is heard. The influence of audio-visual integration between the conflicting spatial information was investigated to see whether it had any influence on the spatial unmasking of speech in noise. No significant difference in speech perception was found regardless of whether visual stimuli was presented at the correct location matching the auditory position, at a spatially disparate location from the auditory source. Lastly the influence of rudimentary lip animation on speech perception was investigated. The results showed that correct lip animations significantly contribute to speech perception. It was also found that incorrect lip animation could result in worse performance than when no lip animation is used at all. The main conclusions from this research are: That the 3D sound capabilities of modern sound hardware can and should be used in virtual environments to present speech; Perfectly align auditory and visual environments are not very important for speech perception; Even rudimentary lip animation can enhance speech perception in virtual environments

    Effects of amplitude modulation on sound localization in reverberant environments.

    Get PDF
    Auditory localization involves different cues depending on the spatial domain. Azimuth localization cues include interaural time differences (ITDs), interaural level differences (ILDs) and pinnae cues. Auditory distance perception (ADP) cues include intensity, spectral cues, binaural cues, and the direct-to-reverberant energy ratio (D/R). While D/R has been established as a primary ADP cue, it is unlikely that it is directly encoded in the auditory system because it can be difficult to extract from ongoing signals. It is also noteworthy that no neuronal population has been identified that specifically codes D/R. It has therefore been proposed that D/R is indirectly encoded in the auditory system, through sensitivity to other acoustic parameters that are correlated with D/R, such as temporal cues (Zahorik, 2002b), spectral properties (Jetzt, 1979; Larsen, 2008), and interaural correlation (Bronkhorst and Houtgast, 1999). An additional D/R correlate relies on attenuation of amplitude modulation (AM) as a function of distance. Room modulation transfer functions act as low-pass filters on AM signals, and therefore the direct portion of a signal will have less modulation depth attenuation than the reverberant portion. Although recent neural and behavioral work has demonstrated that this cue can provide distance information monaurally, the extent to which the modulation attenuation cue contributes to ADP relative to other ADP cues is not fully understood. It is also possible modulation attenuation by the room can provide additional directional localization information, perhaps through the resulting dynamic fluctuation of the ILD cue. The role of AM in directional sound localization has not been extensively studied, particularly in reverberant soundfields which can affect the modulation reaching the two ears in a directionally-dependent fashion. Three human psychophysical experiments assessed the role of AM signals in distance and directional auditory localization in reverberant soundfields. Experiment I focused on validating a graphical response method to be used in subsequent experiments. In Experiment II, an auditory distance estimation task was performed which yielded measures of the relative perceptual contributions of the modulation depth cue during ADP in a reverberant room. Experiment III investigated the effect of AM on binaural localization in the horizontal plane in a reverberant room

    Spatial release from masking in children with and without auditory processing disorder in real and virtual auditory environments

    Get PDF
    Auditory Processing Disorder (APD) is a developmental disorder characterised by difficulties in listening to speech-in-noise despite normal audiometric thresholds. It is still poorly understood and much disputed and there is a need for better diagnostic tools. One promising finding is that some children referred for APD assessment have a reduced spatial release from masking (SRM). Current clinical tests measure SRM in virtual auditory environments created from head-related transfer functions (HRTFs) of a standardised adult head. Adults and children, however, have different head dimensions and mismatched HRTFs are known to affect aspects of binaural hearing like localisation. There has been little research on HRTFs in children and it is unclear whether a large mismatch can impact speech perception, especially for children with APD who have difficulties with accurately processing auditory information. In this project, we examined the effect of nonindividualised virtual auditory environments on the SRM in adults and children with and without APD. The first study with normal-hearing adults compared environments created from individually measured HRTFs and two nonindividualised sets of HRTFs to a real anechoic environment. Speech reception thresholds (SRTs) were measured for target sentences at 0° and two symmetric speech maskers at 0° or ±90° azimuth. No significant effect of auditory environment on SRTs and SRM could be observed. A larger study was then conducted with APD and typically-developing children aged 7 to 12 years. Individual HRTFs were measured for each child. The SRM was measured in environments created from these individualised HRTFs or artificial head HRTFs and in the real anechoic environment. To assess the influence of spectral cues, SRTs were also measured for HRTFs from a spherical head model that only contains interaural time and level differences. Additionally, the study included an extended high-frequency audiogram, a receptive language test and two parental questionnaires. The SRTs of children with APD were worse than those of typically-developing children in all conditions but SRMs were similar. Only small differences in SRTs were found across environments, mainly for the spherical head HRTFs. SRTs in children were higher than in adults but improved with age. APD children also had higher hearing thresholds and performed worse in the language test

    A physiologically inspired model for solving the cocktail party problem.

    Get PDF
    At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio
    corecore