981 research outputs found

    Acoustics and Perception of Clear Fricatives

    Get PDF
    Everyday observation indicates that speakers can naturally and spontaneously adopt a speaking style that allows them to be understood more easily when confronted with difficult communicative situations. Previous studies have demonstrated that the resulting speaking style, known as clear speech, is more intelligible than casual, conversational speech for a variety of listener populations. However, few studies have examined the acoustic properties of clearly produced fricatives in detail. In addition, it is unknown whether clear speech improves the intelligibility of fricative consonants, or how its effects on fricative perception might differ depending on listener population. Since fricatives are the cause of a large number of recognition errors both for normal-hearing listeners in adverse conditions and for hearing-impaired listeners, it is of interest to explore these issues in detail focusing on fricatives. The current study attempts to characterize the type and magnitude of adaptations in the clear production of English fricatives and determine whether clear speech enhances fricative intelligibility for normal-hearing listeners and listeners with simulated impairment. In an acoustic experiment (Experiment I), ten female and ten male talkers produced nonsense syllables containing the fricatives /f, &thetas;, s, [special characters omitted], v, δ, z, and [y]/ in VCV contexts, in both a conversational style and a clear style that was elicited by means of simulated recognition errors in feedback received from an interactive computer program. Acoustic measurements were taken for spectral, amplitudinal, and temporal properties known to influence fricative recognition. Results illustrate that (1) there were consistent overall clear speech effects, several of which (consonant duration, spectral peak location, spectral moments) were consistent with previous findings and a few (notably consonant-to-vowel intensity ratio) which were not, (2) 'contrastive' differences related to acoustic inventory and eliciting prompts were observed in key comparisons, and (3) talkers differed widely in the types and magnitude of acoustic modifications. Two perception experiments using these same productions as stimuli (Experiments II and III) were conducted to address three major questions: (1) whether clearly produced fricatives are more intelligible than conversational fricatives, (2) what specific acoustic modifications are related to clear speech intelligibility advantages, and (3) how sloping, recruiting hearing impairment interacts with clear speech strategies. Both perception experiments used an adaptive procedure to estimate the signal to (multi-talker babble) noise ratio (SNR) threshold at which minimal pair fricative categorizations could be made with 75% accuracy. Data from fourteen normal-hearing listeners (Experiment II) and fourteen listeners with simulated sloping elevated thresholds and loudness recruitment (Experiment III) indicate that clear fricatives were more intelligible overall for both listener groups. However, for listeners with simulated hearing impairment, a reliable clear speech intelligibility advantage was not found for non-sibilant pairs. Correlation analyses comparing acoustic and perceptual style-related differences across the 20 speakers encountered in the experiments indicated that a shift of energy concentration toward higher frequency regions and greater source strength was a primary contributor to the "clear fricative effect" for normal-hearing listeners but not for listeners with simulated loss, for whom information in higher frequency regions was less audible

    Acoustic characteristics of clearly spoken English fricatives

    Get PDF
    This is the publisher's version, also available electronically from http://scitation.aip.org/content/asa/journal/jasa/125/6/10.1121/1.2990715Speakers can adopt a speaking style that allows them to be understood more easily in difficult communication situations, but few studies have examined the acoustic properties of clearly produced consonants in detail. This study attempts to characterize the adaptations in the clear production of American English fricatives in a carefully controlled range of communication situations. Ten female and ten male talkers produced fricatives in vowel-fricative-vowel contexts in both a conversational and a clear style that was elicited by means of simulated recognition errors in feedback received from an interactive computer program. Acoustic measurements were taken for spectral, amplitudinal, and temporal properties known to influence fricative recognition. Results illustrate that (1) there were consistent overall style effects, several of which (consonant duration, spectral peak frequency, and spectral moments) were consistent with previous findings and a few (notably consonant-to-vowel intensity ratio) of which were not; (2) specific acoustic modifications in clear productions of fricatives were influenced by the nature of the recognition errors that prompted the productions and were consistent with efforts to emphasize potentially misperceived contrasts both within the English fricative inventory and based on feedback from the simulated listener; and (3) talkers differed widely in the types and magnitude of all modifications

    The articulatory and acoustic characteristics of Polish sibilants and their consequences for diachronic change

    Get PDF
    The study is concerned with the relative synchronic stability of three contrastive sibilant fricatives /s (sic)/ in Polish. Tongue movement data were collected from nine first-language Polish speakers producing symmetrical real and non-word CVCV sequences in three vowel contexts. A Gaussian model was used to classify the sibilants from spectral information in the noise and from formant frequencies at vowel onset. The physiological analysis showed an almost complete separation between /s (sic)/ on tongue-tip parameters. The acoustic analysis showed that the greater energy at higher frequencies distinguished /s/ in the fricative noise from the other two sibilant categories. The most salient information at vowel onset was for /(sic)/, which also had a strong palatalizing effect on the following vowel. Whereas either the noise or vowel onset was largely sufficient for the identification of /s (sic)/ respectively, both sets of cues were necessary to separate /(sic)/ from /s (sic)/. The greater synchronic instability of /(sic)/ may derive from its high articulatory complexity coupled with its comparatively low acoustic salience. The data also suggest that the relatively late stage of /(sic)/ acquisition by children may come about because of the weak acoustic information in the vowel for its distinction from /s/

    Temporal and spectral parameters in perception of the voicing contrast in English and Polish

    Get PDF
    Niniejsza praca koncentruje się na czasowych i spektralnych parametrach percepcji dźwięczności w języku angielskim i polskim. Metodologia badań oparta została na akustycznej manipulacji parametrami temporalnymi i spektralnymi, które biorą udział w implementacji kontrastu dźwięczności w badanych językach. Porównane zostały trzy grupy badanych: początkujący uczący się języka angielskiego, zaawansowani użytkownicy języka angielskiego oraz rodowici użytkownicy języka angielskiego. Praca składa się z dwóch części teoretycznych, ilustrujących problematykę i zestawiających z sobą różne strategie implementacji kontrastu dźwięczności w badanych językach, oraz części badawczej, prezentującej zastosowaną metodologię badań i analizę wyników. Część pierwsza porusza problem roli percepcji mowy w badaniach językoznawczych. Dotyka takich aspektów jak brak bezpośredniej relacji między sygnałem dźwiękowym a kategorią fonologiczną, wyjątkowa plastyczność i zdolność adaptacyjna ludzkiej percepcji mowy, a także referuje propozycje dotyczące kompleksowego opisu działania ludzkiej percepcji mowy. W kolejnych podrozdziałach praca omawia percepcję w kontekście kontaktu językowego, a więc rozróżnianie kontrastów dźwiękowych występujących w języku obcym, ale nieobecnych w języku pierwszym. Zostają również zrecenzowane modele, które taki proces opisują, jak i hipotezy dotyczące potencjalnego sukcesu w opanowaniu efektywnej percepcji kontrastów dźwiękowych występujących w języku obcym. Część druga pracy koncentruje się na różnicach temporalnych i akustycznych w implementacji dźwięczności w języku angielskim i polskim. Opisane zostały aspekty, takie jak: parametr VOT, długość samogłoski, długość zwarcia, długość frykcji, ubezdźwięcznienie, długość wybuchu. Cześć trzecia, badawcza, prezentuje materiał wykorzystany podczas badania percepcji, metodologię manipulacji tym materiałem oraz charakterystykę grup osób poddanych badaniom. Hipotezy oparte na założeniach teoretycznych są następnie weryfikowane na podstawie otrzymanych wyników. Część końcowa omawia problemy percepcyjne, jakie spotykają Polaków uczących się języka angielskiego, oraz zawiera wnioski dydaktyczne

    The phonological functions of segmental and subsegmental duration

    Get PDF
    The paper discusses the role segmental and subsegmental duration in the organization of a sound system in English and Polish. It analyses how duration contributes to signaling phonological phenomena such as voicing, words stress and word boundary. Special emphasis is put on cross-linguistic differences between English and Polish and how those differences emerge in the process of learning English by speakers of Polish

    Perception of English and Polish obstruents

    Get PDF
    Praca niniejsza koncentruje się na kontraście dźwięczna-bezdźwięczna w percepcji angielskich i polskich spółgłosek właściwych. Metodologia badań oparta została na manipulacji akustycznej parametrów temporalnych i spektralnych, które biorą udział w implementacji kontrastu dźwięczności w badanych językach. Porównane zastałych trzy grupy badanych – początkujący uczący się języka angielskiego, zaawansowani użytkownicy języka angielskiego, oraz rodowici mówcy języka angielskiego. Praca składa się z dwóch części teoretycznych, ilustrujących problematykę i kontrastujących strategie implementacji kontrastu dźwięczności w badanych językach, oraz części badawczej, prezentującej zastosowaną metodologię badań oraz analizę wyników. Część pierwsza porusza problem roli percepcji mowy w badaniach językoznawczych. Dotyka takich aspektów jak brak bezpośredniej relacji między sygnałem dźwiękowym a kategorią fonologiczną, wyjątkowa plastyczność i zdolność adaptacyjna ludzkiej percepcji mowy, oraz referuje propozycje dotyczące kompleksowego opisu działania ludzkiej percepcji mowy. W kolejnych podrozdziałach praca omawia percepcję w kontekście kontaktu językowego, a więc rozróżnianie kontrastów akustycznych występujących w języku obcym, ale nieobecnych w języku pierwszym. Zostają również zrecenzowane modele, które taki proces opisują, jak i hipotezy opisujące potencjalny sukces w opanowaniu efektywnej percepcji kontrastów percepcyjnych występujących w języku obcym. Część druga koncentruje się na różnicach temporalnych i akustycznych w implementacji dźwięczności w języku angielskim i polskim. Opisane zostają aspekty takie jak; Voice Onset Time, długość samogłoski, długość zwarcia, długość frykcji, ubezdźwięcznienie, długość wybuchu. Cześć trzecia, badawcza, prezentuje materiał poddany badaniu, metodologię manipulacji materiału, oraz charakterystykę grup. Hipotezy oparte na założeniach teoretycznych są następnie weryfikowane przy pomocy otrzymanych wyników. Część końcowa omawia problemy percepcyjne, jakie spotykają Polaków uczących się języka angielskiego oraz wyciąga wnioski pedagogiczne

    The impact of spectrally asynchronous delay on the intelligibility of conversational speech

    Get PDF
    Conversationally spoken speech is rampant with rapidly changing and complex acoustic cues that individuals are able to hear, process, and encode to meaning. For many hearing-impaired listeners, a hearing aid is necessary to hear these spectral and temporal acoustic cues of speech. For listeners with mild-moderate high frequency sensorineural hearing loss, open-fit digital signal processing (DSP) hearing aids are the most common amplification option. Open-fit DSP hearing aids introduce a spectrally asynchronous delay to the acoustic signal by allowing audible low frequency information to pass to the eardrum unimpeded while the aid delivers amplified high frequency sounds to the eardrum that has a delayed onset relative to the natural pathway of sound. These spectrally asynchronous delays may disrupt the natural acoustic pattern of speech. The primary goal of this study is to measure the effect of spectrally asynchronous delay on the intelligibility of conversational speech by normal-hearing and hearing-impaired listeners. A group of normal-hearing listeners (n = 25) and listeners with mild-moderate high frequency sensorineural hearing loss (n = 25) participated in this study. The acoustic stimuli included 200 conversationally-spoken recordings of the low predictability sentences from the revised speech perception in noise test (r-SPIN). These 200 sentences were modified to control for audibility for the hearing-impaired group and so that the acoustic energy above 2 kHz was delayed by either 0 ms (control), 4ms, 8ms, or 32 ms relative to the low frequency energy. The data were analyzed in order to find the effect of each of the four delay conditions on the intelligibility of the final key word of each sentence. Normal-hearing listeners were minimally affected by the asynchronous delay. However, the hearing-impaired listeners were deleteriously affected by increasing amounts of spectrally asynchronous delay. Although the hearing-impaired listeners performed well overall in their perception of conversationally spoken speech in quiet, the intelligibility of conversationally spoken sentences significantly decreased when the delay values were equal to or greater than 4 ms. Therefore, hearing aid manufacturers need to restrict the amount of delay introduced by DSP so that it does not distort the acoustic patterns of conversational speech

    Learning to Produce Speech with an Altered Vocal Tract: The Role of Auditory Feedback

    Get PDF
    Modifying the vocal tract alters a speaker’s previously learned acoustic–articulatory relationship. This study investigated the contribution of auditory feedback to the process of adapting to vocal-tract modifications. Subjects said the word /tɑs/ while wearing a dental prosthesis that extended the length of their maxillary incisor teeth. The prosthesis affected /s/ productions and the subjects were asked to learn to produce ‘‘normal’’ /s/’s. They alternately received normal auditory feedback and noise that masked their natural feedback during productions. Acoustic analysis of the speakers’ /s/ productions showed that the distribution of energy across the spectra moved toward that of normal, unperturbed production with increased experience with the prosthesis. However, the acoustic analysis did not show any significant differences in learning dependent on auditory feedback. By contrast, when naive listeners were asked to rate the quality of the speakers’ utterances, productions made when auditory feedback was available were evaluated to be closer to the subjects’ normal productions than when feedback was masked. The perceptual analysis showed that speakers were able to use auditory information to partially compensate for the vocal-tract modification. Furthermore, utterances produced during the masked conditions also improved over a session, demonstrating that the compensatory articulations were learned and available after auditory feedback was removed

    Mechanisms of vowel devoicing in Japanese

    Get PDF
    The processes of vowel devoicing in Standard Japanese were examined with respect to the phonetic and phonological environments and the syllable structure of Japanese, in comparison with vowel reduction processes in other languages, in most of which vowel reduction occurs optionally in fast or casual speech. This thesis examined whether Japanese vowel devoicing was a phonetic phenomenon caused by glottal assimilation between a high vowel and its adjacent voiceless consonants, or it was a more phonologically controlled compulsory process. Experimental results showed that Japanese high vowel devoicing must be analysed separately in two devoicing conditions, namely single and consecutive devoicing environments. Devoicing was almost compulsory regardless of the presence of proposed blocking factors such as type of preceding consonant, accentuation, position in an utterance, as long as there was no devoiceable vowel in adjacent morae (single devoicing condition). However, under consecutive devoicing conditions, blocking factors became effective and prevented some devoiceable vowels from becoming voiceless. The effect of speaking rate was also generally minimal in the single devoicing condition, but in the consecutive devoicing condition, the vowels were devoiced more at faster tempi than slower tempi, which created many examples of consecutively devoiced vowels over two morae. Durational observations found that vowel devoicing involves not only phonatory change, but also slight durational reduction. However, the shorter duration of devoiced syllables were adjusted at the word level, so that the whole duration of a word with devoiced vowels remained similar to the word without devoiced vowels, regardless of the number of devoiced vowels in the word. It must be noted that there was no clear-cut distinction between voiced and devoiced vowels, and the phonetic realisation of a devoiced vowel could vary from fully voiced to completely voiceless. A high vowel may be voiced in a typical devoicing environment, but its intensity is significantly weaker than those of vowels in a non-devoicing environment, at all speaking tempi. The mean differences of vowel intensities between these environments were generally higher at faster tempi. The results implied that even when the vowel was voiced, its production process moved in favour of devoicing. However, in consecutive devoicing conditions, this process did not always apply. When some of the devoiceable vowels were devoiced in the consecutive devoicing environment, the intensities of devoiceable vowels were not significantly lower than those of other vowels. The results of intensity measurements of voiced vowels in the devoicing and nondevoicing environments suggested that Japanese vowel devoicing was part of the overall process of complex vowel weakening, and that a completely devoiced vowel was the final state of the weakening process. Japanese vowel devoicing is primarily a process of glottal assimilation, but the results in the consecutive devoicing condition showed that this process was constrained by Japanese syllable structure

    A mixed inventory structure for German concatenative synthesis

    Get PDF
    In speech synthesis by unit concatenation a major point is the definition of the unit inventory. Diphone or demisyllable inventories are widely used but both unit types have their drawbacks. This paper describes a mixed inventory structure which is syllable oriented but does not demand a definite decision about the position of a syllable boundary. In the definition process of the inventory the results of a comprehensive investigation of coarticulatory phenomena at syllable boundaries were used as well as a machine readable pronunciation dictionary. An evaluation comparing the mixed inventory with a demisyllable and a diphone inventory confirms that speech generated with the mixed inventory is superior regarding general acceptance. A segmental intelligibility test shows the high intelligibility of the synthetic speech
    corecore