144 research outputs found

    A microscopic analysis of consistent word misperceptions.

    Get PDF
    162 p.Speech misperceptions have the potential to help us understand the mechanisms involved in human speech processing. Consistent misperceptions are especially helpful in this regard, eliminating the variability stemming from individual differences, which in turn, makes it easier to analyse confusion patterns at higher levels of speech inits such as the word. In this thesis, we haver a conducter an analysis of consistens word misperceptions from a "microscopic" perspective. Starting with a large-scale elicitation experiment, we collected over 3200 consistent misperceptions from over 170 listeners. We investigated the obtained misperceptions from signal-idependent and a signal-dependent perspective. In the former, we have analysed error trends between the target and misperceived words across multiple levels of speech units. We have shown that the error patterns observed are highly dependent on the eliciting masker type and contrasted our results to previous findings. In the latter, We attempted to explain misperceptions based on the underlying speech noise interaction. Using tools from automatic speech recognition, we have conducted an automatic classification of confusions based on their origin and quantified the role misallocation of speech fragments played in the generation of misperceptions. Finally, we introduced modifications to the original confusion eliciting stimuli to try to recover the original utterance by providing release from either themasker`s energetic or informational component. ListenersÂżpercepts were reevaluated in response to the modified stimuli which revealed the origin of many confusions regarding energetic or informational masking

    Perceptual Asymmetry and Sound Change: An Articulatory, Acoustic/Perceptual, and Computational Analysis

    Full text link
    Previous experimental study of the identification of stop and fricative consonants has shown that some consonant pairs are asymmetrically confused for one another, with listeners’ percepts tending to favor one member of the pair in a conditioning context. Researchers have also suggested that this phenomenon may play a conditioning role in sound change, although the mechanism by which perceptual asymmetry facilitates language change is somewhat unclear. This dissertation uses articulatory, acoustic, and perceptual data to provide insight on why perceptual asymmetry is observed among certain consonants and in specific contexts. It also uses computational modeling to generate initial predictions about the contexts in which perceptual asymmetry could contribute to stability or change in phonetic categories. Six experiments were conducted, each addressing asymmetry in the consonant pairs /k/-/t/ (before /i/), /k/-/p/ (before /i u/), /p/-/t/ (before /i/), and /ξ/-/f/ (possibly unconditioned). In the articulatory experiment, vocal tract spatial parameters were extracted from real-time MRI video of speakers producing VCV disyllables in order to address the role of vocal tract shape in the target consonants’ vowel-dependent spectral similarity. The results suggest that, for consonant pairs involving /k/, CV coarticulation creates—as expected—vocal tract shapes that are most similar to one another in the environment conditioning perceptual asymmetry. However, CV coarticulation was less informative for explaining the vocalic conditioning of the /p/-/t/ asymmetry. In the second experiment, RF models were trained on acoustic samples of the target consonants from a speech corpus. Their output, which was used to identify frequency components important to the discrimination of consonant pairs, aligned well with these consonants’ spectral characteristics as predicted by acoustic models. A follow-up perception experiment that examined the categorization strategies of participants listening to band-filtered CV syllables generally showed listener sensitivity to these same components, although listeners were also sensitive to band-filtering outside the predicted frequency bands. Perceptual asymmetry is observed in CV and isolated C contexts. In the fourth experiment, a Bayesian analysis was performed to help explain why perceptual asymmetry appears when listening to isolated Cs, and a follow-up perception experiment helped to evaluate the relevance of this analysis to human perception. For /k/-/t/, for example, whose confusions favor /t/, this analysis suggested that [t] and [k] both have the highest likelihood of being generated by /t/ (relative to likelihood of /k/ generating each) in the context conditioning asymmetry. The follow-up study suggests listeners are more likely to categorize a [t] and [k] as /t/ if it has higher likelihood of being generated by /t/ (relative to /k/). The final experiment used agent-based modeling to simulate the intergenerational transmission of phonetic categories. Its results suggest that perceptual asymmetry can affect the acquisition of categories under certain conditions. A lack of reliable access to non-phonetic information about the speaker’s intended category or a tendency not to store tokens with low discriminability can both contribute to the instability of phonetic categories over time, but primarily in the contexts conditioning asymmetry. This dissertation makes several contributions to research on perceptual asymmetry. The articulatory experiment suggests that confusability can be mirrored by gestural ambiguity. The Bayesian analysis could also be used to build and test predictions about the confusability of other sounds by context. Finally, the model simulations offer predictions of the conditions where perceptual asymmetry could condition sound change.PHDLinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155085/1/iccallow_1.pd

    Integration of phonological information in obstruent consonant identification

    Get PDF
    Thesis (Ph.D.) - Indiana University, Linguistics, 2009Speech perception requires the integration of information from multiple phonetic and phonological dimensions. Numerous studies have investigated the mapping between multiple acoustic-phonetic dimensions and single phonological dimensions (e.g., spectral and temporal properties of stop consonants in voicing contrasts). Many fewer studies have addressed relationships between phonological dimensions. Most such studies have focused on the perception of sequences of phones (e.g., 'bid', 'bed', 'bit', 'bet'), though some have focused on multiple phonological dimensions within phones (e.g., voicing and place of articulation in [p], [b], [t], and [d]). However, strong assumptions about relevant acoustic-phonetic dimensions and/or the nature of perceptual and decisional information integration limit previous findings in important ways. New methodological developments in the General Recognition Theory framework enable a number of these assumptions to be tested and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A Bayesian non-parametric analysis of data from four experiments probing identification of (two sets of) consonants in onset (syllable initial) and coda (syllable final) position indicate that integration of phonological information is partially independent in both perception and decision making for most subjects, and that patterns of independence and interaction vary with the set of phonological dimensions under consideration and with syllable position

    Effects of forensically-relevant facial concealment on acoustic and perceptual properties of consonants

    Get PDF
    This thesis offers a thorough investigation into the effects of forensically-relevant facial concealment on speech acoustics and perception. Specifically, it explores the extent to which selected acoustic-phonetic and auditory-perceptual properties of consonants are affected when the talker is wearing ‘facewear’ while speaking. In this context, the term ‘facewear’ refers to the various types of face-concealing garments and headgear that are worn by people in common daily communication situations; for work and leisure, or as an expression of religious, social and cultural affiliation (e.g. surgical masks, motorcycle helmets, ski and cycling masks, or full-face veils such as the niqāb). It also denotes the face or head coverings that are typically used as deliberate (visual) disguises during the commission of crimes and in situations of public disorder (e.g. balaclavas, hooded sweatshirts, or scarves). The present research centres on the question: does facewear influence the way that consonants are produced, transmitted, and perceived? To examine the effects of facewear on the acoustic speech signal, various intensity, spectral, and temporal properties of spoken English consonants were measured. It was found that facewear can considerably alter the acoustic-phonetic characteristics of consonants. This was likely to be the result of both deliberate and involuntary changes to the talker’s speech productions, and of sound energy absorption by the facewear material. The perceptual consequences of the acoustic modifications to speech were assessed by way of a consonant identification study and a talker discrimination study. The results of these studies showed that auditory-only and auditory-visual consonant intelligibility, as well as the discrimination of unfamiliar talkers, may be greatly compromised when the observer’s judgements are based on ‘facewear speech’. The findings reported in this thesis contribute to our understanding of how auditory and visual information interact during natural speech processing. Furthermore, the results have important practical implications for legal cases in which speech produced through facewear is of pivotal importance. Forensic speech scientists are therefore advised to take the possible effects of facewear on speech into account when interpreting the outcome of their acoustic and auditory analyses of evidential speech recordings, and when evaluating the reliability of earwitness testimony

    Perception and production of English vowels by Chilean learners of English: effect of auditory and visual modalities on phonetic training

    Get PDF
    The aim of this thesis was to examine the perception of English vowels by L2 learners of English with Spanish as L1 (Chilean-Spanish), and more specifically the degree to which they are able to take advantage of visual cues to vowel distinctions. Two main studies were conducted for this thesis. In study 1, data was collected from L2 beginners, L2 advanced learners and native speakers of Southern British English (ENS). Participants were tested on their perception of 11 English vowels in audio (A), audiovisual (AV) and video-alone (V) mode. ENS participants were tested to investigate whether visual cues are available to distinguish English vowels, while L2 participants were tested to see how sensitive they were to acoustic and visual cues for English vowels. Study 2 reports the outcome of a vowel training study. To compare the effect of different training modalities, three groups of L2 learners (beginner level) were given five sessions of high-variability vowel training in either A, AV or V mode. Perception and production of English vowels in isolated words and sentences was tested pre/post training, and the participants’ auditory frequency discrimination and visual bias was also evaluated. To examine the impact of perceptual training on L2 learners’ vowel production, recordings of key words embedded in read sentences were made pre and post-training. Acoustic-phonetic analyses were carried out on the vowels in the keywords. Additionally, the vowels were presented to native listeners in a rating test to judge whether the perceptual training resulted in significant improvement in intelligibility. In summary, the study with native English listeners showed that there was visual information available to distinguish at least certain English vowel contrasts. L2 learners showed low sensitivity to visual information. Their vowel perception improved after training, regardless of the training mode used, and perceptual training also led to improved vowel production. However, no improvement was found in their overall sensitivity to visual information

    English as a lingua franca: mutual intelligibility of Chinese, Dutch and American speakers of English

    Get PDF
    English has become the language of international communication. As a result of this development, we are now confronted with a bewildering variety of ‘Englishes’, spoken with non-native accents. Research determining how intelligible non-native speakers of varying native-language backgrounds are to each other and to native speakers of English has only just started to receive attention. This thesis investigated to what extent Chinese, Dutch and American speakers of English are mutually intelligible. Intelligibility of vowels, simplex consonants and consonant clusters was tested in meaningless sound sequences, as well as in words in meaningless and meaningful short sentences. Speakers (one male, one female per language background) were selected so as to be optimally representative of their peer groups, which were made up of young academic users of English. Intelligibility was tested for all nine combinations of speaker and listener backgrounds. Results show that Chinese-accented English is less intelligible overall than Dutch-accented English, which is less intelligible than American English. Generally, the native-language background of the speaker was less important for the intelligibility than the background of the listener. Also, the results reveal a clear and consistent so-called interlanguage speech intelligibility benefit: speakers of English – whether foreign or native – are more intelligible to listeners with whom they share the native-language background than to listeners with a different native language.LEI Universiteit LeidenChina Scholarship Council; Leids Universiteits FondsTheoretical and Experimental Linguistic

    Perceptual compensation for reverberation in human listeners and machines

    Get PDF
    This thesis explores compensation for reverberation in human listeners and machines. Late reverberation is typically understood as a distortion which degrades intelligibility. Recent research, however, shows that late reverberation is not always detrimental to human speech perception. At times, prolonged exposure to reverberation can provide a helpful acoustic context which improves identification of reverberant speech sounds. The physiology underpinning our robustness to reverberation has not yet been elucidated, but is speculated in this thesis to include efferent processes which have previously been shown to improve discrimination of noisy speech. These efferent pathways descend from higher auditory centres, effectively recalibrating the encoding of sound in the cochlea. Moreover, this thesis proposes that efferent-inspired computational models based on psychoacoustic principles may also improve performance for machine listening systems in reverberant environments. A candidate model for perceptual compensation for reverberation is proposed in which efferent suppression derives from the level of reverberation detected in the simulated auditory nerve response. The model simulates human performance in a phoneme-continuum identification task under a range of reverberant conditions, where a synthetically controlled test-word and its surrounding context phrase are independently reverberated. Addressing questions which arose from the model, a series of perceptual experiments used naturally spoken speech materials to investigate aspects of the psychoacoustic mechanism underpinning compensation. These experiments demonstrate a monaural compensation mechanism that is influenced by both the preceding context (which need not be intelligible speech) and by the test-word itself, and which depends on the time-direction of reverberation. Compensation was shown to act rapidly (within a second or so), indicating a monaural mechanism that is likely to be effective in everyday listening. Finally, the implications of these findings for the future development of computational models of auditory perception are considered
    • 

    corecore