138 research outputs found

    Understanding concurrent earcons: applying auditory scene analysis principles to concurrent earcon recognition

    Get PDF
    Two investigations into the identification of concurrently presented, structured sounds, called earcons were carried out. One of the experiments investigated how varying the number of concurrently presented earcons affected their identification. It was found that varying the number had a significant effect on the proportion of earcons identified. Reducing the number of concurrently presented earcons lead to a general increase in the proportion of presented earcons successfully identified. The second experiment investigated how modifying the earcons and their presentation, using techniques influenced by auditory scene analysis, affected earcon identification. It was found that both modifying the earcons such that each was presented with a unique timbre, and altering their presentation such that there was a 300 ms onset-to-onset time delay between each earcon were found to significantly increase identification. Guidelines were drawn from this work to assist future interface designers when incorporating concurrently presented earcons

    Detection and localization of speech in the presence of competing speech signals

    Get PDF
    Presented at the 12th International Conference on Auditory Display (ICAD), London, UK, June 20-23, 2006.Auditory displays are often used to convey important information in complex operational environments. One problem with these displays is that potentially critical information can be corrupted or lost when multiple warning sounds are presented at the same time. In this experiment, we examined a listener's ability to detect and localize a target speech token in the presence of from 1 to 5 simultaneous competing speech tokens. Two conditions were examined: a condition in which all of the speech tokens were presented from the same location (the `co-located' condition) and a condition in which the speech tokens were presented from different random locations (the `spatially separated' condition). The results suggest that both detection and localization degrade as the number of competing sounds increases. However, the changes in detection performance were found to be surprisingly small and there appeared to be little or no benefit of spatial separation for detection. Localization, on the other hand, was found to degrade substantially and systematically as the number of competing speech tokens increased. Overall, these results suggest that listeners are able to extract substantial information from these speech tokens even when the target is presented with 5 competing simultaneous sounds

    Flying by Ear: Blind Flight with a Music-Based Artificial Horizon

    Get PDF
    Two experiments were conducted in actual flight operations to evaluate an audio artificial horizon display that imposed aircraft attitude information on pilot-selected music. The first experiment examined a pilot's ability to identify, with vision obscured, a change in aircraft roll or pitch, with and without the audio artificial horizon display. The results suggest that the audio horizon display improves the accuracy of attitude identification overall, but differentially affects response time across conditions. In the second experiment, subject pilots performed recoveries from displaced aircraft attitudes using either standard visual instruments, or, with vision obscured, the audio artificial horizon display. The results suggest that subjects were able to maneuver the aircraft to within its safety envelope. Overall, pilots were able to benefit from the display, suggesting that such a display could help to improve overall safety in general aviation

    Optimizing the spatial configuration of a seven-talker speech display

    Get PDF
    Proceedings of the 9th International Conference on Auditory Display (ICAD), Boston, MA, July 7-9, 2003.Although there is substantial evidence that performance in multitalker listening tasks can be improved by spatially separating the apparent locations of the competing talkers, very little effort has been made to determine the best locations and presentation levels for the talkers in a multichannel speech display. In this experiment, a call-sign based color and number identification task was used to evaluate the effectiveness of three different spatial configurations and two different level normalization schemes in a sevenchannel binaural speech display. When only two spatially-adjacent channels of the seven-channel system were active, overall performance was substantially better with a geometrically-spaced spatial configuration (with far-field talkers at -90 , -30 , -10 , 0 , +10 , +30 , and +90 azimuth) or a hybrid near-far configuration (with far-field talkers at -90 , -30 , 0 , +30 , and +90 azimuth and near-field talkers at 90 ) than with a more conventional linearlyspaced configuration (with far-field talkers at -90 , -60 , -30 , 0 , +30 , +60 , and +90 azimuth). When all seven channels were active, performance was generally better with a ``better-ear'' normalization scheme that equalized the levels of the talkers in the more intense ear than with a default normalization scheme that equalized the levels of the talkers at the center of the head. The best overall performance in the seven-talker task occurred when the hybrid near-far spatial configuration was combined with the better-ear normalization scheme. This combination resulted in a 20% increase in the number of correct identifications relative to the baseline condition with linearly-spaced talker locations and no level normalization. Although this is a relatively modest improvement, it should be noted that it could be achieved at little or no cost simply by reconfiguring the HRTFs used in a multitalker speech display

    Acoustic Cues for Sound Source Distance and Azimuth in Rabbits, a Racquetball and a Rigid Spherical Model

    Get PDF
    There are numerous studies measuring the transfer functions representing signal transformation between a source and each ear canal, i.e., the head-related transfer functions (HRTFs), for various species. However, only a handful of these address the effects of sound source distance on HRTFs. This is the first study of HRTFs in the rabbit where the emphasis is on the effects of sound source distance and azimuth on HRTFs. With the rabbit placed in an anechoic chamber, we made acoustic measurements with miniature microphones placed deep in each ear canal to a sound source at different positions (10–160 cm distance, ±150° azimuth). The sound was a logarithmically swept broadband chirp. For comparisons, we also obtained the HRTFs from a racquetball and a computational model for a rigid sphere. We found that (1) the spectral shape of the HRTF in each ear changed with sound source location; (2) interaural level difference (ILD) increased with decreasing distance and with increasing frequency. Furthermore, ILDs can be substantial even at low frequencies when distance is close; and (3) interaural time difference (ITD) decreased with decreasing distance and generally increased with decreasing frequency. The observations in the rabbit were reproduced, in general, by those in the racquetball, albeit greater in magnitude in the rabbit. In the sphere model, the results were partly similar and partly different than those in the racquetball and the rabbit. These findings refute the common notions that ILD is negligible at low frequencies and that ITD is constant across frequency. These misconceptions became evident when distance-dependent changes were examined

    The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure

    Get PDF
    Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker’s voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues

    Neural Correlates of Auditory Perceptual Awareness under Informational Masking

    Get PDF
    Our ability to detect target sounds in complex acoustic backgrounds is often limited not by the ear's resolution, but by the brain's information-processing capacity. The neural mechanisms and loci of this “informational masking” are unknown. We combined magnetoencephalography with simultaneous behavioral measures in humans to investigate neural correlates of informational masking and auditory perceptual awareness in the auditory cortex. Cortical responses were sorted according to whether or not target sounds were detected by the listener in a complex, randomly varying multi-tone background known to produce informational masking. Detected target sounds elicited a prominent, long-latency response (50–250 ms), whereas undetected targets did not. In contrast, both detected and undetected targets produced equally robust auditory middle-latency, steady-state responses, presumably from the primary auditory cortex. These findings indicate that neural correlates of auditory awareness in informational masking emerge between early and late stages of processing within the auditory cortex
    corecore