7,580 research outputs found

    Effects of noise suppression and envelope dynamic range compression on the intelligibility of vocoded sentences for a tonal language

    Get PDF
    Vocoder simulation studies have suggested that the carrier signal type employed affects the intelligibility of vocoded speech. The present work further assessed how carrier signal type interacts with additional signal processing, namely, single-channel noise suppression and envelope dynamic range compression, in determining the intelligibility of vocoder simulations. In Experiment 1, Mandarin sentences that had been corrupted by speech spectrum-shaped noise (SSN) or two-talker babble (2TB) were processed by one of four single-channel noise-suppression algorithms before undergoing tone-vocoded (TV) or noise-vocoded (NV) processing. In Experiment 2, dynamic ranges of multiband envelope waveforms were compressed by scaling of the mean-removed envelope waveforms with a compression factor before undergoing TV or NV processing. TV Mandarin sentences yielded higher intelligibility scores with normal-hearing (NH) listeners than did noise-vocoded sentences. The intelligibility advantage of noise-suppressed vocoded speech depended on the masker type (SSN vs 2TB). NV speech was more negatively influenced by envelope dynamic range compression than was TV speech. These findings suggest that an interactional effect exists between the carrier signal type employed in the vocoding process and envelope distortion caused by signal processing

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Microstructural damage of the posterior corpus callosum contributes to the clinical severity of neglect

    Get PDF
    One theory to account for neglect symptoms in patients with right focal damage invokes a release of inhibition of the right parietal cortex over the left parieto-frontal circuits, by disconnection mechanism. This theory is supported by transcranial magnetic stimulation studies showing the existence of asymmetric inhibitory interactions between the left and right posterior parietal cortex, with a right hemispheric advantage. These inhibitory mechanisms are mediated by direct transcallosal projections located in the posterior portions of the corpus callosum. The current study, using diffusion imaging and tract-based spatial statistics (TBSS), aims at assessing, in a data-driven fashion, the contribution of structural disconnection between hemispheres in determining the presence and severity of neglect. Eleven patients with right acute stroke and 11 healthy matched controls underwent MRI at 3T, including diffusion imaging, and T1-weighted volumes. TBSS was modified to account for the presence of the lesion and used to assess the presence and extension of changes in diffusion indices of microscopic white matter integrity in the left hemisphere of patients compared to controls, and to investigate, by correlation analysis, whether this damage might account for the presence and severity of patients' neglect, as assessed by the Behavioural Inattention Test (BIT). None of the patients had any macroscopic abnormality in the left hemisphere; however, 3 cases were discarded due to image artefacts in the MRI data. Conversely, TBSS analysis revealed widespread changes in diffusion indices in most of their left hemisphere tracts, with a predominant involvement of the corpus callosum and its projections on the parietal white matter. A region of association between patients' scores at BIT and brain FA values was found in the posterior part of the corpus callosum. This study strongly supports the hypothesis of a major role of structural disconnection between the right and left parietal cortex in determining 'neglect'

    A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers

    Get PDF
    One criterion in the design of binaural sound scenes in audio production is the extent to which the intended speech message is correctly understood. Object-based audio broadcasting systems have permitted sound editors to gain more access to the metadata (e.g., intensity and location) of each sound source, providing better control over speech intelligibility. The current study describes and evaluates a binaural distortion-weighted glimpse proportion metric -- BiDWGP -- which is motivated by better-ear glimpsing and binaural masking level differences. BiDWGP predicts intelligibility from two alternative input forms: either binaural recordings or monophonic recordings from each sound source along with their locations. Two listening experiments were performed with stationary noise and competing speech, one in the presence of a single masker, the other with multiple maskers, for a variety of spatial configurations. Overall, BiDWGP with both input forms predicts listener keyword scores with correlations of 0.95 and 0.91 for single- and multi-masker conditions, respectively. When considering masker type separately, correlations rise to 0.95 and above for both types of maskers. Predictions using the two input forms are very similar, suggesting that BiDWGP can be applied to the design of sound scenes where only individual sound sources and their locations are available
    corecore