5,640 research outputs found

    THE RELATIONSHIP BETWEEN ACOUSTIC FEATURES OF SECOND LANGUAGE SPEECH AND LISTENER EVALUATION OF SPEECH QUALITY

    Get PDF
    Second language (L2) speech is typically less fluent than native speech, and differs from it phonetically. While the speech of some L2 English speakers seems to be easily understood by native listeners despite the presence of a foreign accent, other L2 speech seems to be more demanding, such that listeners must expend considerable effort in order to understand it. One reason for this increased difficulty may simply be the speaker’s pronunciation accuracy or phonetic intelligibility. If a L2 speaker’s pronunciations of English sounds differ sufficiently from the sounds that native listeners expect, these differences may force native listeners to work much harder to understand the divergent speech patterns. However, L2 speakers also tend to differ from native ones in terms of fluency – the degree to which a speaker is able to produce appropriately structured phrases without unnecessary pauses, self-corrections or restarts. Previous studies have shown that measures of fluency are strongly predictive of listeners’ subjective ratings of the acceptability of L2 speech: Less fluent speech is consistently considered less acceptable (Ginther, Dimova, & Yang, 2010). However, since less fluent speakers tend also to have less accurate pronunciations, it is unclear whether or how these factors might interact to influence the amount of effort listeners exert to understand L2 speech, nor is it clear how listening effort might relate to perceived quality or acceptability of speech. In this dissertation, two experiments were designed to investigate these questions

    Comparisons between simulated and in-situ measured speech intelligibility based on (binaural) room impulse responses

    Get PDF
    This study systematically compares acoustic simulation and in-situ measurement in terms of speech transmission index (STI), speech intelligibility scores and relationship curves when considering (binaural) room impulse response and four general room conditions, namely, an office, a laboratory, a multimedia lecture hall and a semi-anechoic chamber. The results reveal that STI can be predicted accurately by acoustic simulation (using room acoustics software ODEON) when there is a good agreement between the virtual models and the real rooms and that different reverberation time (RT) and signal-to-noise ratio (SNR) may exert less significant influence on the simulated STI. However, subjective intelligibility may be overestimated when using acoustic simulation due to the head-related transfer function (HRTF) filter used, and the score bias may be minimal and difficult to detect in everyday situations. There is no obvious score tendency caused by different RT, though with the decrease in the SNR, score bias may increase. Overall, considering that the accurate acoustic modelling of rooms is often problematic, it is difficult to obtain accurate speech intelligibility prediction results using a simulation technique, especially when the room has not yet been built

    InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

    Full text link
    Speech intelligibility and quality assessment models are essential tools for researchers to evaluate and improve speech processing models. However, only a few studies have investigated multi-task models for intelligibility and quality assessment due to the limitations of available data. In this study, we released TMHINT-QI, the first Chinese speech dataset that records the quality and intelligibility scores of clean, noisy, and enhanced utterances. Then, we propose InQSS, a non-intrusive multi-task learning framework for intelligibility and quality assessment. We evaluated the InQSS on both the training-from-scratch and the pretrained models. The experimental results confirm the effectiveness of the InQSS framework. In addition, the resulting model can predict not only the intelligibility scores but also the quality scores of a speech signal.Comment: accepted by Insterspeech 202

    The Influence on Cortical Brainwaves in Relation to Word Intelligibility and ASW in Room

    Get PDF
    The influence of indoor speech intelligibility and apparent source width (ASW) on the response of cortical brainwaves was studied using two variables, the time gap between direct and the first reflection (Δt1, ms) and the initial (<80 ms) interaural cross-correlation function (IACCE3). Comparisons were performed based on autocorrelation function (ACF) of continuous brainwave (CBW) and slow vertex response (SVR). The results are: (1) the effective delay time of ACF (τe) of β-waves (13–30 Hz) in the left hemisphere under changes in Δt1 was significantly and positively correlated with speech intelligibility (p < 0.001). (2) As ASW increased, the relative amplitude of left hemisphere A (P2-N2) tended to decrease (p < 0.05) in SVRs, while N2 latency tended to increase (p < 0.05); the lateral lemniscus in the auditory nerve was suggested to be the reactive site. (3) With regard to hemispheric specialization in brain, speech intelligibility, the main temporal factor, was found to be controlled by the left hemisphere. A subjective spatial factor, ASW, the relative amplitude of SVR was also found to decrease in the left hemisphere; nevertheless, they are coherent while the N2 latency of SVR significantly prolonged in both left and right hemisphere under changes in IACCE3

    Development of the Slovak HMM-Based TTS System and Evaluation of Voices in Respect to the Used Vocoding Techniques

    Get PDF
    This paper describes the development of a Slovak text-to-speech system which applies a technique wherein speech is directly synthesized from hidden Markov models. Statistical models for Slovak speech units are trained by using the newly created female and male phonetically balanced speech corpora. In addition, contextual informations about phonemes, syllables, words, phrases, and utterances were determined, as well as questions for decision tree-based context clustering algorithms. In this paper, recent statistical parametric speech synthesis methods including the conventional, STRAIGHT and AHOcoder speech synthesis systems are implemented and evaluated. Objective evaluation methods (mel-cepstral distortion and fundamental frequency comparison) and subjective ones (mean opinion score and semantically unpredictable sentences test) are carried out to compare these systems with each other and evaluation of their overall quality. The result of this work is a set of text to speech systems for Slovak language which are characterized by very good intelligibility and quite good naturalness of utterances at the output of these systems. In the subjective tests of intelligibility the STRAIGHT based female voice and AHOcoder based male voice reached the highest scores

    Koklear İmplant Konuşma İşlemcileri için Optimum Parametrelerin Objektif Ölçütler Kullanılarak Belirlenmesi

    Get PDF
    In a cochlear implant (CI) speech processor, several parameters such as channel numbers, bandwidths, rectification type, and cutoff frequency play an important role in acquiring enhanced speech. The effective and general purpose CI approach has been a research topic for a long time. In this study, it is aimed to determine the optimum parameters for CI users by using different channel numbers (4, 8, 12, 16, and 22), rectification types (half and full) and cutoff frequencies (200, 250, 300, 350, and 400 Hz). The CI approaches have been tested on Turkish sentences which are taken from METU database. The optimum CI structure has been tested with objective quality that weighted spectral slope (WSS) and objective intelligibility measures such as short-term objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). Experimental results show that 400 Hz cutoff frequency, full wave rectifier, and 16-channels CI approach give better quality and higher intelligibility scores than other CI approaches according to STOI, PESQ and WSS results. The proposed CI approach provides the ability to percept 91% of output vocoded Turkish speech for CI users. © 2022, TUBITAK. All rights reserved
    • …
    corecore