1,176 research outputs found

    A Vowel Analysis of the Northwestern University-Children\u27s Perception of Speech Evaluation Tool

    Get PDF
    In an analysis of the speech perception evaluation tool, the Northwestern University – Children’s Perception of Speech test, the goal was to determine whether the foil words and the target word were phonemically balanced across each page of test Book A, as it corresponds to the target words presented in Test Form 1 and Test Form 2 independently. Based on vowel sounds alone, variation exists in the vowels that appear on a test page on the majority of pages. The corresponding formant frequencies, at all three resonance levels for both the average adult male speaker and the average adult female speaker, revealed that the target word could be easily distinguished from the foil words on the premise of percent differences calculated between the formants of the target vowel and the foil vowels. For the population of children with hearing impairments, especially those with limited or no access to the high frequencies, the NU-CHIPS evaluation tool may not be the best indicator of the child’s speech perception ability due to significant vowel variations

    Speech Communication

    Get PDF
    Contains reports on five research projects.National Institutes of Health (Grant 5 RO1 NS04332-12)National Institutes of Health (Grant HD05168-04)U.S. Navy Office of Naval Research (Contract N00014-67-A-0204-0069)Joint Services Electronics Program (Contract DAAB07-74-C-0630)National Science Foundation (Grant SOC74-22167

    Analyzing liquids

    Get PDF

    The uptake of spectral and temporal cues in vowel perception is rapidly influenced by context

    No full text
    Speech perception is dependent on auditory information within phonemes such as spectral or temporal cues. The perception of those cues, however, is affected by auditory information in surrounding context (e.g., a fast context sentence can make a target vowel sound subjectively longer). In a two-by-two design the current experiments investigated when these different factors influence vowel perception. Dutch listeners categorized minimal word pairs such as /tɑk/–/taːk/ (“branch”–“task”) embedded in a context sentence. Critically, the Dutch /ɑ/–/aː/ contrast is cued by spectral and temporal information. We varied the second formant (F2) frequencies and durations of the target vowels. Independently, we also varied the F2 and duration of all segments in the context sentence. The timecourse of cue uptake on the targets was measured in a printed-word eye-tracking paradigm. Results show that the uptake of spectral cues slightly precedes the uptake of temporal cues. Furthermore, acoustic manipulations of the context sentences influenced the uptake of cues in the target vowel immediately. That is, listeners did not need additional time to integrate spectral or temporal cues of a target sound with auditory information in the context. These findings argue for an early locus of contextual influences in speech perception

    The use of acoustic cues in phonetic perception: Effects of spectral degradation, limited bandwidth and background noise

    Get PDF
    Hearing impairment, cochlear implantation, background noise and other auditory degradations result in the loss or distortion of sound information thought to be critical to speech perception. In many cases, listeners can still identify speech sounds despite degradations, but understanding of how this is accomplished is incomplete. Experiments presented here tested the hypothesis that listeners would utilize acoustic-phonetic cues differently if one or more cues were degraded by hearing impairment or simulated hearing impairment. Results supported this hypothesis for various listening conditions that are directly relevant for clinical populations. Analysis included mixed-effects logistic modeling of contributions of individual acoustic cues for various contrasts. Listeners with cochlear implants (CIs) or normal-hearing (NH) listeners in CI simulations showed increased use of acoustic cues in the temporal domain and decreased use of cues in the spectral domain for the tense/lax vowel contrast and the word-final fricative voicing contrast. For the word-initial stop voicing contrast, NH listeners made less use of voice-onset time and greater use of voice pitch in conditions that simulated high-frequency hearing impairment and/or masking noise; influence of these cues was further modulated by consonant place of articulation. A pair of experiments measured phonetic context effects for the "s/sh" contrast, replicating previously observed effects for NH listeners and generalizing them to CI listeners as well, despite known deficiencies in spectral resolution for CI listeners. For NH listeners in CI simulations, these context effects were absent or negligible. Audio-visual delivery of this experiment revealed enhanced influence of visual lip-rounding cues for CI listeners and NH listeners in CI simulations. Additionally, CI listeners demonstrated that visual cues to gender influence phonetic perception in a manner consistent with gender-related voice acoustics. All of these results suggest that listeners are able to accommodate challenging listening situations by capitalizing on the natural (multimodal) covariance in speech signals. Additionally, these results imply that there are potential differences in speech perception by NH listeners and listeners with hearing impairment that would be overlooked by traditional word recognition or consonant confusion matrix analysis

    Recognition of English vowels using top-down method

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 69-70).Many recognizers use bottom-up methods for recognizing each phoneme or feature, and use the cues and the context to find the most appropriate words or sentences. But humans recognize words not just through bottom-up processing, but also top-down. In many cases of listening, one can usually predict what will come based on the preceding context, or one can determine what has been pronounced by listening to the following sounds. Therefore, if some cues to a word are given, it would be possible to refine the recognition by using the top-down method. This thesis deals with the improvement of the performance of recognition by using the top-down method. And most of the work will be concentrated on the problem of vowel recognition, when the adjacent consonants are known.by Park Chi-youn.S.M

    A Comparative Study of Spectral Peaks Versus Global Spectral Shape as Invariant Acoustic Cues for Vowels

    Get PDF
    The primary objective of this study was to compare two sets of vowel spectral features, formants and global spectral shape parameters, as invariant acoustic cues to vowel identity. Both automatic vowel recognition experiments and perceptual experiments were performed to evaluate these two feature sets. First, these features were compared using the static spectrum sampled in the middle of each steady-state vowel versus features based on dynamic spectra. Second, the role of dynamic and contextual information was investigated in terms of improvements in automatic vowel classification rates. Third, several speaker normalizing methods were examined for each of the feature sets. Finally, perceptual experiments were performed to determine whether vowel perception is more correlated with formants or global spectral shape. Results of the automatic vowel classification experiments indicate that global spectral shape features contain more information than do formants. For both feature sets, dynamic features are superior to static features. Spectral features spanning a time interval beginning with the start of the on-glide region of the acoustic vowel segment and ending at the end of the off-glide region of the acoustic vowel segment are required for maximum vowel recognition accuracy. Speaker normalization of both static and dynamic features can also be used to improve the automatic vowel recognition accuracy. Results of the perceptual experiments with synthesized vowel segments indicate that if formants are kept fixed, global spectral shape can, at least for some conditions, be modified such that the synthetic speech token will be perceived according to spectral shape cues rather than formant cues. This result implies that overall spectral shape may be more important perceptually than the spectral prominences represented by the formants. The results of this research contribute to a fundamental understanding of the information-encoding process in speech. The signal processing techniques used and the acoustic features found in this study can also be used to improve the preprocessing of acoustic signals in the front-end of automatic speech recognition systems

    Vowel recognition in continuous speech

    Get PDF
    In continuous speech, the identification of phonemes requires the ability to extract features that are capable of characterizing the acoustic signal. Previous work has shown that relatively high classification accuracy can be obtained from a single spectrum taken during the steady-state portion of the phoneme, assuming that the phonetic environment is held constant. The present study represents an attempt to extend this work to variable phonetic contexts by using dynamic rather than static spectral information. This thesis has four aims: 1) Classify vowels in continuous speech; 2) Find the optimal set of features that best describe the vowel regions; 3) Compare the classification results using a multivariate maximum likelihood distance measure with those of a neural network using the backpropagation model; 4) Examine the classification performance of a Hidden Markov Model given a pathway through phonetic space

    What drives sound symbolism? Different acoustic cues underlie sound-size and sound-shape mappings

    Get PDF
    Sound symbolism refers to the non-arbitrary mappings that exist between phonetic properties of speech sounds and their meaning. Despite there being an extensive literature on the topic, the acoustic features and psychological mechanisms that give rise to sound symbolism are not, as yet, altogether clear. The present study was designed to investigate whether different sets of acoustic cues predict size and shape symbolism, respectively. In two experiments, participants judged whether a given consonant-vowel speech sound was large or small, round or angular, using a size or shape scale. Visual size judgments were predicted by vowel formant F1 in combination with F2, and by vowel duration. Visual shape judgments were, however, predicted by formants F2 and F3. Size and shape symbolism were thus not induced by a common mechanism, but rather were distinctly affected by acoustic properties of speech sounds. These findings portray sound symbolism as a process that is not based merely on broad categorical contrasts, such as round/unround and front/back vowels. Rather, individuals seem to base their sound-symbolic judgments on specific sets of acoustic cues, extracted from speech sounds, which vary across judgment dimensions

    Investigation of Auditory Encoding and the Use of Auditory Feedback During Speech Production

    Get PDF
    Responses to altered auditory feedback during speech production are highly variable. The extent to which auditory encoding influences this varied use is not well understood. Thirty-nine normal hearing adults completed a first formant (F1) manipulation paradigm where F1 of the vowel /ε/ was shifted upwards in frequency towards an /æ/–like vowel in real-time. Frequency following responses (FFRs) and envelope following responses (EFRs) were used to measure neuronal activity to the same vowels produced by the participant and a prototypical talker. Cochlear tuning, measured by SFOAEs and a psychophysical method, was also recorded. Results showed that average F1 production changed to oppose the manipulation. Three metrics of EFR and FFR encoding were evaluated. No reliable relationship was found between speech compensation and evoked response measures or measures of cochlear tuning. Differences in brainstem encoding of vowels and sharpness of cochlear tuning do not appear to explain the variability observed in speech production
    corecore