525 research outputs found

    Listening in large rooms : a neurophysiological investigations of acoustical conditions that influence speech intelligibility

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Whitaker College of Health Sciences and Technology, 1997.Includes bibliographical references (p. 34-37).by Benjamin Michael Hammond.M.S

    Segmentation of binaural room impulse responses for speech intelligibility prediction

    Get PDF
    The two most important aspects in binaural speech perception—better-ear-listening and spatial-release-from-masking—can be predicted well with current binaural modeling frameworks operating on head-related impulse responses, i.e., anechoic binaural signals. To incorporate effects of reverberation, a model extension was proposed, splitting binaural room impulse responses into an early, useful, and late, detrimental part, before being fed into the modeling framework. More recently, an interaction between the applied splitting time, room properties, and the resulting prediction accuracy was observed. This interaction was investigated here by measuring speech reception thresholds (SRTs) in quiet with 18 normal-hearing subjects for four simulated rooms with different reverberation times and a constant room geometry. The mean error with one of the most promising binaural prediction models could be reduced by about 1 dB by adapting the applied splitting time to room acoustic parameters. This improvement in prediction accuracy can make up a difference of 17% in absolute intelligibility within the applied SRT measurement paradigm

    A new cascaded spectral subtraction approach for binaural speech dereverberation and its application in source separation

    Get PDF
    In this work we propose a new binaural spectral subtraction method for the suppression of late reverberation. The pro- posed approach is a cascade of three stages. The first two stages exploit distinct observations to model and suppress the late reverberation by deriving a gain function. The musical noise artifacts generated due to the processing at each stage are compensated by smoothing the spectral magnitudes of the weighting gains. The third stage linearly combines the gains obtained from the first two stages and further enhances the binaural signals. The binaural gains, obtained by indepen- dently processing the left and right channel signals are com- bined using a new method. Experiments on real data are per- formed in two contexts: dereverberation-only and joint dere- verberation and source separation. Objective results verify the suitability of the proposed cascaded approach in both the contexts

    Effect of Reverberation Context on Spatial Hearing Performance of Normally Hearing Listeners

    Get PDF
    Previous studies provide evidence that listening experience in a particular reverberant environment improves speech intelligibility and localization performance in that environment. Such studies, however, are few, and there is little knowledge of the underlying mechanisms. The experiments presented in this thesis explored the effect of reverberation context, in particular, the similarity in interaural coherence within a context, on listeners\u27 performance in sound localization, speech perception in a spatially separated noise, spatial release from speech-on-speech masking, and target location identification in a multi-talker configuration. All experiments were conducted in simulated reverberant environments created with a loudspeaker array in an anechoic chamber. The reflections comprising the reverberation in each environment had the same temporal and relative amplitude patterns, but varied in their lateral spread, which affected the interaural coherence of reverberated stimuli. The effect of reverberation context was examined by comparing performance in two reverberation contexts, mixed and fixed. In the mixed context, the reverberation environment applied to each stimulus varied trial-by-trial, whereas in the fixed context, the reverberation environment was held constant within a block of trials. In Experiment I (absolute judgement of sound location), variability in azimuth judgments was lower in the fixed than in the mixed context, suggesting that sound localization depended not only on the cues presented in isolated trials. In Experiment II, the intelligibility of speech in a spatially separated noise was found to be similar in both reverberation contexts. That result contrasts with other studies, and suggests that the fixed context did not assist listeners in compensating for degraded interaural coherence. In Experiment III, speech intelligibility in multi-talker configurations was found to be better in the fixed context, but only when the talkers were separated. That is, the fixed context improved spatial release from masking. However, in the presence of speech maskers, consistent reverberation did not improve the localizability of the target talker in a three-alternative location-identification task. Those results suggest that in multi-talker situations, consistent coherence may not improve target localizability, but rather that consistent context may facilitate the buildup of spatial selective attention

    Factors affecting speech intelligibility improvement with exposure to reverberant room listening environments.

    Get PDF
    Speech intelligibility has been found to improve with prior exposure to a reverberant room environment. It is believed that perceptual mechanisms help maintain accurate speech perception under these adverse conditions. Potential factors underlying this speech enhancement effect were examined in three experiments. Experiment 1 studied the time course of speech intelligibility enhancement in multiple room environments. Carrier phrases of varying lengths were used to measure changes in speech intelligibility over time. Results showed an effect of speech enhancement with a time course that varied with the signal-to-noise ratio between the speech and a broad-band noise masker. Additionally, greater speech enhancement was found for reverberant environments compared to anechoic space, which suggests that a de-reverberation mechanism in the auditory system may enhance the temporal processing of speech. Experiment 2 examined the influence of the specific source and listener position within the room environment on speech enhancement. Source and listener configurations in three virtual room environments were altered to create a disparity between the position of a carrier phrase and a following speech target. Results showed robust effects of speech enhancement when the source and listener configuration were mismatched which suggests that speech enhancement relies on the general decay pattern of the room environment and not the specific temporal/spatial configuration of early reflections. Experiment 3 assessed the relationships between room-associated speech enhancement and single-reflection echo suppression by measuring echo thresholds for both a traditional click-based stimuli and with speech materials. Echo thresholds were found to be uncorrelated with the results of Experiment I. This suggests that early reflections have little impact on the de-reverberation aspect of speech enhancement, which is consistent with the results from Experiment II. A two-process hypothesis is proposed to account for the results of these experiments as well as previous research on this topic. Prior exposure to a speech pattern provided via carrier phrases is argued to elicit improved temporal processing of speech that results in speech enhancement. It is also argued that a process of de-reverberation effectively reduces the attenuation of temporal information in room environments

    Informed algorithms for sound source separation in enclosed reverberant environments

    Get PDF
    While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses
    • …
    corecore