144 research outputs found
A microscopic analysis of consistent word misperceptions.
162 p.Speech misperceptions have the potential to help us understand the mechanisms involved in human speech processing. Consistent misperceptions are especially helpful in this regard, eliminating the variability stemming from individual differences, which in turn, makes it easier to analyse confusion patterns at higher levels of speech inits such as the word. In this thesis, we haver a conducter an analysis of consistens word misperceptions from a "microscopic" perspective. Starting with a large-scale elicitation experiment, we collected over 3200 consistent misperceptions from over 170 listeners. We investigated the obtained misperceptions from signal-idependent and a signal-dependent perspective. In the former, we have analysed error trends between the target and misperceived words across multiple levels of speech units. We have shown that the error patterns observed are highly dependent on the eliciting masker type and contrasted our results to previous findings. In the latter, We attempted to explain misperceptions based on the underlying speech noise interaction. Using tools from automatic speech recognition, we have conducted an automatic classification of confusions based on their origin and quantified the role misallocation of speech fragments played in the generation of misperceptions. Finally, we introduced modifications to the original confusion eliciting stimuli to try to recover the original utterance by providing release from either themasker`s energetic or informational component. ListenersÂżpercepts were reevaluated in response to the modified stimuli which revealed the origin of many confusions regarding energetic or informational masking
Perceptual Asymmetry and Sound Change: An Articulatory, Acoustic/Perceptual, and Computational Analysis
Previous experimental study of the identification of stop and fricative consonants has shown that some consonant pairs are asymmetrically confused for one another, with listenersâ percepts tending to favor one member of the pair in a conditioning context. Researchers have also suggested that this phenomenon may play a conditioning role in sound change, although the mechanism by which perceptual asymmetry facilitates language change is somewhat unclear. This dissertation uses articulatory, acoustic, and perceptual data to provide insight on why perceptual asymmetry is observed among certain consonants and in specific contexts. It also uses computational modeling to generate initial predictions about the contexts in which perceptual asymmetry could contribute to stability or change in phonetic categories. Six experiments were conducted, each addressing asymmetry in the consonant pairs /k/-/t/ (before /i/), /k/-/p/ (before /i u/), /p/-/t/ (before /i/), and /Ξ/-/f/ (possibly unconditioned).
In the articulatory experiment, vocal tract spatial parameters were extracted from real-time MRI video of speakers producing VCV disyllables in order to address the role of vocal tract shape in the target consonantsâ vowel-dependent spectral similarity. The results suggest that, for consonant pairs involving /k/, CV coarticulation createsâas expectedâvocal tract shapes that are most similar to one another in the environment conditioning perceptual asymmetry. However, CV coarticulation was less informative for explaining the vocalic conditioning of the /p/-/t/ asymmetry.
In the second experiment, RF models were trained on acoustic samples of the target consonants from a speech corpus. Their output, which was used to identify frequency components important to the discrimination of consonant pairs, aligned well with these consonantsâ spectral characteristics as predicted by acoustic models. A follow-up perception experiment that examined the categorization strategies of participants listening to band-filtered CV syllables generally showed listener sensitivity to these same components, although listeners were also
sensitive to band-filtering outside the predicted frequency bands.
Perceptual asymmetry is observed in CV and isolated C contexts. In the fourth experiment, a Bayesian analysis was performed to help explain why perceptual asymmetry appears when listening to isolated Cs, and a follow-up perception experiment helped to evaluate the relevance of this analysis to human perception. For /k/-/t/, for example, whose confusions favor /t/, this analysis suggested that [t] and [k] both have the highest likelihood of being generated by /t/ (relative to likelihood of /k/ generating each) in the context conditioning asymmetry. The follow-up study suggests listeners are more likely to categorize a [t] and [k] as /t/ if it has higher likelihood of being generated by /t/ (relative to /k/).
The final experiment used agent-based modeling to simulate the intergenerational transmission of phonetic categories. Its results suggest that perceptual asymmetry can affect the acquisition of categories under certain conditions. A lack of reliable access to non-phonetic information about the speakerâs intended category or a tendency not to store tokens with low discriminability can both contribute to the instability of phonetic categories over time, but primarily in the contexts conditioning asymmetry.
This dissertation makes several contributions to research on perceptual asymmetry. The articulatory experiment suggests that confusability can be mirrored by gestural ambiguity. The Bayesian analysis could also be used to build and test predictions about the confusability of other sounds by context. Finally, the model simulations offer predictions of the conditions where perceptual asymmetry could condition sound change.PHDLinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155085/1/iccallow_1.pd
Integration of phonological information in obstruent consonant identification
Thesis (Ph.D.) - Indiana University, Linguistics, 2009Speech perception requires the integration of information from multiple phonetic and phonological dimensions. Numerous studies have investigated the mapping between multiple acoustic-phonetic dimensions and single phonological dimensions (e.g., spectral and temporal properties of stop consonants in voicing contrasts). Many fewer studies have addressed relationships between phonological dimensions. Most such studies have focused on the perception of sequences of phones (e.g., 'bid', 'bed', 'bit', 'bet'), though some have focused on multiple phonological dimensions within phones (e.g., voicing and place of articulation in [p], [b], [t], and [d]). However, strong assumptions about relevant acoustic-phonetic dimensions and/or the nature of perceptual and decisional information integration limit previous findings in important ways. New methodological developments in the General Recognition Theory framework enable a number of these assumptions to be tested and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A Bayesian non-parametric analysis of data from four experiments probing identification of (two sets of) consonants in onset (syllable initial) and coda (syllable final) position indicate that integration of phonological information is partially independent in both perception and decision making for most subjects, and that patterns of independence and interaction vary with the set of phonological dimensions under consideration and with syllable position
Effects of forensically-relevant facial concealment on acoustic and perceptual properties of consonants
This thesis offers a thorough investigation into the effects of forensically-relevant facial concealment on speech acoustics and perception. Specifically, it explores the extent to which selected acoustic-phonetic and auditory-perceptual properties of consonants are affected when the talker is wearing âfacewearâ while speaking. In this context, the term âfacewearâ refers to the various types of face-concealing garments and headgear that are worn by people in common daily communication situations; for work and leisure, or as an expression of religious, social and cultural affiliation (e.g. surgical masks, motorcycle helmets, ski and cycling masks, or full-face veils such as the niqÄb). It also denotes the face or head coverings that are typically used as deliberate (visual) disguises during the commission of crimes and in situations of public disorder (e.g. balaclavas, hooded sweatshirts, or scarves). The present research centres on the question: does facewear influence the way that consonants are produced, transmitted, and perceived? To examine the effects of facewear on the acoustic speech signal, various intensity, spectral, and temporal properties of spoken English consonants were measured. It was found that facewear can considerably alter the acoustic-phonetic characteristics of consonants. This was likely to be the result of both deliberate and involuntary changes to the talkerâs speech productions, and of sound energy absorption by the facewear material. The perceptual consequences of the acoustic modifications to speech were assessed by way of a consonant identification study and a talker discrimination study. The results of these studies showed that auditory-only and auditory-visual consonant intelligibility, as well as the discrimination of unfamiliar talkers, may be greatly compromised when the observerâs judgements are based on âfacewear speechâ. The findings reported in this thesis contribute to our understanding of how auditory and visual information interact during natural speech processing. Furthermore, the results have important practical implications for legal cases in which speech produced through facewear is of pivotal importance. Forensic speech scientists are therefore advised to take the possible effects of facewear on speech into account when interpreting the outcome of their acoustic and auditory analyses of evidential speech recordings, and when evaluating the reliability of earwitness testimony
Perception and production of English vowels by Chilean learners of English: effect of auditory and visual modalities on phonetic training
The aim of this thesis was to examine the perception of English vowels by L2 learners of English with Spanish as L1 (Chilean-Spanish), and more specifically the degree to which they are able to take advantage of visual cues to vowel distinctions. Two main studies were conducted for this thesis. In study 1, data was collected from L2 beginners, L2 advanced learners and native speakers of Southern British English (ENS). Participants were tested on their perception of 11 English vowels in audio (A), audiovisual (AV) and video-alone (V) mode. ENS participants were tested to investigate whether visual cues are available to distinguish English vowels, while L2 participants were tested to see how sensitive they were to acoustic and visual cues for English vowels. Study 2 reports the outcome of a vowel training study. To compare the effect of different training modalities, three groups of L2 learners (beginner level) were given five sessions of high-variability vowel training in either A, AV or V mode. Perception and production of English vowels in isolated words and sentences was tested pre/post training, and the participantsâ auditory frequency discrimination and visual bias was also evaluated. To examine the impact of perceptual training on L2 learnersâ vowel production, recordings of key words embedded in read sentences were made pre and post-training. Acoustic-phonetic analyses were carried out on the vowels in the keywords. Additionally, the vowels were presented to native listeners in a rating test to judge whether the perceptual training resulted in significant improvement in intelligibility. In summary, the study with native English listeners showed that there was visual information available to distinguish at least certain English vowel contrasts. L2 learners showed low sensitivity to visual information. Their vowel perception improved after training, regardless of the training mode used, and perceptual training also led to improved vowel production. However, no improvement was found in their overall sensitivity to visual information
Recommended from our members
Plasticity in second language (L2) learning: perception of L2 phonemes by native Greek speakers of English
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Understanding the process of language acquisition is a challenge that many researchers spanning different disciplines (e.g. linguistics, psychology, neuroscience) have grappled with for centuries. One which has in recent years attracted a lot of attention has been in the area of non-native phoneme acquisition. Speech sounds that contain multiple phonetic cues are often difficult for foreign-language learners, especially if certain cues are weighted differently in the foreign and native languages. Greek adult and child speakers of English were studied to determine which cues (duration or spectral) they were using to make discrimination and identification judgments for an English vowel contrast pair. To this end, two forms of identification and discrimination tasks were used: natural (unedited) stimuli and another âmodifiedâ vowel duration stimuli which were edited so that there were no duration differences between the vowels. Results show the Greek speakers were particularly impaired when they were unable to use the duration cue as compared to the native English speakers. Similar results were also obtained in control experiments where there was no orthographic representation or where the stimuli were cross-spliced to modify the phonetic neighborhood. Further experiments used high-variability training sessions to enhance vowel perception. Following training, performance improved for both Greek adult and child groups as revealed by post training tests. However the improvements were most pronounced for the child Greek speaker group. A further study examined the effect of different orthographic cues that might affect rhyme and homophony judgment. The results of that study showed that Greek speakers were in general more affected by orthography and regularity (particularly of the vowel) in making these judgments. This would suggest that Greek speakers were more sensitive to irrelevant orthographic cues, mirroring the results in the auditory modality where they focused on irrelevant acoustic cues. The results are discussed in terms of current theories of language acquisition, with particular reference to acquisition of non-native phonemes.School of Social Sciences, Brunel Universit
English as a lingua franca: mutual intelligibility of Chinese, Dutch and American speakers of English
English has become the language of international communication. As a result of this development, we are now confronted with a bewildering variety of âEnglishesâ, spoken with non-native accents. Research determining how intelligible non-native speakers of varying native-language backgrounds are to each other and to native speakers of English has only just started to receive attention. This thesis investigated to what extent Chinese, Dutch and American speakers of English are mutually intelligible. Intelligibility of vowels, simplex consonants and consonant clusters was tested in meaningless sound sequences, as well as in words in meaningless and meaningful short sentences. Speakers (one male, one female per language background) were selected so as to be optimally representative of their peer groups, which were made up of young academic users of English. Intelligibility was tested for all nine combinations of speaker and listener backgrounds. Results show that Chinese-accented English is less intelligible overall than Dutch-accented English, which is less intelligible than American English. Generally, the native-language background of the speaker was less important for the intelligibility than the background of the listener. Also, the results reveal a clear and consistent so-called interlanguage speech intelligibility benefit: speakers of English â whether foreign or native â are more intelligible to listeners with whom they share the native-language background than to listeners with a different native language.LEI Universiteit LeidenChina Scholarship Council; Leids Universiteits FondsTheoretical and Experimental Linguistic
Perceptual compensation for reverberation in human listeners and machines
This thesis explores compensation for reverberation in human listeners and machines. Late reverberation is typically understood as a distortion which degrades intelligibility. Recent research, however, shows that late reverberation is not always detrimental to human speech perception. At times, prolonged exposure to reverberation can provide a helpful acoustic context which improves identification of reverberant speech sounds. The physiology underpinning our robustness to reverberation has not yet been elucidated, but is speculated in this thesis to include efferent processes which have previously been shown to improve discrimination of noisy speech. These efferent pathways descend from higher auditory centres, effectively recalibrating the encoding of sound in the cochlea. Moreover, this thesis proposes that efferent-inspired computational models based on psychoacoustic principles may also improve performance for machine listening systems in reverberant environments.
A candidate model for perceptual compensation for reverberation is proposed in which efferent suppression derives from the level of reverberation detected in the simulated auditory nerve response. The model simulates human performance in a phoneme-continuum identification task under a range of reverberant conditions, where a synthetically controlled test-word and its surrounding context phrase are independently reverberated. Addressing questions which arose from the model, a series of perceptual experiments used naturally spoken speech materials to investigate aspects of the psychoacoustic mechanism underpinning compensation. These experiments demonstrate a monaural compensation mechanism that is influenced by both the preceding context (which need not be intelligible speech) and by the test-word itself, and which depends on the time-direction of reverberation. Compensation was shown to act rapidly (within a second or so), indicating a monaural mechanism that is likely to be effective in everyday listening. Finally, the implications of these findings for the future development of computational models of auditory perception are considered
- âŠ