128 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    The influence of channel and source degradations on intelligibility and physiological measurements of effort

    Get PDF
    Despite the fact that everyday listening is compromised by acoustic degradations, individuals show a remarkable ability to understand degraded speech. However, recent trends in speech perception research emphasise the cognitive load imposed by degraded speech on both normal-hearing and hearing-impaired listeners. The perception of degraded speech is often studied through channel degradations such as background noise. However, source degradations determined by talkers’ acoustic-phonetic characteristics have been studied to a lesser extent, especially in the context of listening effort models. Similarly, little attention has been given to speaking effort, i.e., effort experienced by talkers when producing speech under channel degradations. This thesis aims to provide a holistic understanding of communication effort, i.e., taking into account both listener and talker factors. Three pupillometry studies are presented. In the first study, speech was recorded for 16 Southern British English speakers and presented to normal-hearing listeners in quiet and in combination with three degradations: noise-vocoding, masking and time-compression. Results showed that acoustic-phonetic talker characteristics predicted intelligibility of degraded speech, but not listening effort, as likely indexed by pupil dilation. In the second study, older hearing-impaired listeners were presented fast time-compressed speech under simulated room acoustics. Intelligibility was kept at high levels. Results showed that both fast speech and reverberant speech were associated with higher listening effort, as suggested by pupillometry. Discrepancies between pupillometry and perceived effort ratings suggest that both methods should be employed in speech perception research to pinpoint processing effort. While findings from the first two studies support models of degraded speech perception, emphasising the relevance of source degradations, they also have methodological implications for pupillometry paradigms. In the third study, pupillometry was combined with a speech production task, aiming to establish an equivalent to listening effort for talkers: speaking effort. Normal-hearing participants were asked to read and produce speech in quiet or in the presence of different types of masking: stationary and modulated speech-shaped noise, and competing-talker masking. Results indicated that while talkers acoustically enhance their speech more under stationary masking, larger pupil dilation associated with competing-speaker masking reflected higher speaking effort. Results from all three studies are discussed in conjunction with models of degraded speech perception and production. Listening effort models are revisited to incorporate pupillometry results from speech production paradigms. Given the new approach of investigating source factors using pupillometry, methodological issues are discussed as well. The main insight provided by this thesis, i.e., the feasibility of applying pupillometry to situations involving listener and talker factors, is suggested to guide future research employing naturalistic conversations

    Predicting Speech Intelligibility

    Get PDF
    Hearing impairment, and specifically sensorineural hearing loss, is an increasingly prevalent condition, especially amongst the ageing population. It occurs primarily as a result of damage to hair cells that act as sound receptors in the inner ear and causes a variety of hearing perception problems, most notably a reduction in speech intelligibility. Accurate diagnosis of hearing impairments is a time consuming process and is complicated by the reliance on indirect measurements based on patient feedback due to the inaccessible nature of the inner ear. The challenges of designing hearing aids to counteract sensorineural hearing losses are further compounded by the wide range of severities and symptoms experienced by hearing impaired listeners. Computer models of the auditory periphery have been developed, based on phenomenological measurements from auditory-nerve fibres using a range of test sounds and varied conditions. It has been demonstrated that auditory-nerve representations of vowels in normal and noisedamaged ears can be ranked by a subjective visual inspection of how the impaired representations differ from the normal. This thesis seeks to expand on this procedure to use full word tests rather than single vowels, and to replace manual inspection with an automated approach using a quantitative measure. It presents a measure that can predict speech intelligibility in a consistent and reproducible manner. This new approach has practical applications as it could allow speechprocessing algorithms for hearing aids to be objectively tested in early stage development without having to resort to extensive human trials. Simulated hearing tests were carried out by substituting real listeners with the auditory model. A range of signal processing techniques were used to measure the model’s auditory-nerve outputs by presenting them spectro-temporally as neurograms. A neurogram similarity index measure (NSIM) was developed that allowed the impaired outputs to be compared to a reference output from a normal hearing listener simulation. A simulated listener test was developed, using standard listener test material, and was validated for predicting normal hearing speech intelligibility in quiet and noisy conditions. Two types of neurograms were assessed: temporal fine structure (TFS) which retained spike timing information; and average discharge rate or temporal envelope (ENV). Tests were carried out to simulate a wide range of sensorineural hearing losses and the results were compared to real listeners’ unaided and aided performance. Simulations to predict speech intelligibility performance of NAL-RP and DSL 4.0 hearing aid fitting algorithms were undertaken. The NAL-RP hearing aid fitting algorithm was adapted using a chimaera sound algorithm which aimed to improve the TFS speech cues available to aided hearing impaired listeners. NSIM was shown to quantitatively rank neurograms with better performance than a relative mean squared error and other similar metrics. Simulated performance intensity functions predicted speech intelligibility for normal and hearing impaired listeners. The simulated listener tests demonstrated that NAL-RP and DSL 4.0 performed with similar speech intelligibility restoration levels. Using NSIM and a computational model of the auditory periphery, speech intelligibility can be predicted for both normal and hearing impaired listeners and novel hearing aids can be rapidly prototyped and evaluated prior to real listener tests

    Properties of naturally produced clear speech at normal rates and implications for intelligibility enhancement

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (leaves 258-264).In adverse listening conditions, talkers can increase their intelligibility by speaking clearly[45, 55, 42]. While producing clear speech, however, talkers often reduce their speaking rate significantly[46, 55]. A recent study[28] showed that speaking slowly is not responsible for the high intelligibility of clear speech, since talkers can produce clear speech at normal rates with training. This finding suggests that acoustical factors other than reduced speaking rate are responsible for the high intelligibility of clear speech. To gain insight into these factors, acoustical properties (global, phonological, and phonetic) of conversational and clear speech produced at normal speaking rates were examined. Three global acoustic properties associated with clear/normal speech were identified: increased energy near the second and third formants, higher average and greater range of F0, and increased modulation depth of low frequency modulations of the intensity envelope. In order to determine which of these acoustical properties of clear/normal speech contribute most to its high intelligibility, signal processing transformations of conversational speech were developed. Results of intelligibility tests with hearing-impaired listeners and normal hearing listeners in noise suggest that these properties may not fully account for the intelligibility benefit of clear/normal speech. Other properties important for highly intelligible speech may not have been identified in this study due to the complexity of the acoustic database and varying talker strategies.by Jean Christine Krause.Ph.D

    Properties of naturally produced clear speech at normal rates and implications for intelligibility enhancement

    Get PDF
    Supervised by Louis D. Braida.Also issued as Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (leaves 258-264).by Jean Christine Krause

    Sensory Communication

    Get PDF
    Contains table of contents for Section 2, an introduction and reports on fourteen research projects.National Institutes of Health Grant RO1 DC00117National Institutes of Health Grant RO1 DC02032National Institutes of Health/National Institute on Deafness and Other Communication Disorders Grant R01 DC00126National Institutes of Health Grant R01 DC00270National Institutes of Health Contract N01 DC52107U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-95-K-0014U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-96-K-0003U.S. Navy - Office of Naval Research Grant N00014-96-1-0379U.S. Air Force - Office of Scientific Research Grant F49620-95-1-0176U.S. Air Force - Office of Scientific Research Grant F49620-96-1-0202U.S. Navy - Office of Naval Research Subcontract 40167U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-96-K-0002National Institutes of Health Grant R01-NS33778U.S. Navy - Office of Naval Research Grant N00014-92-J-184

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Investigating the build-up of precedence effect using reflection masking

    Get PDF
    The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels
    corecore