164 research outputs found

    On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch

    Get PDF
    The goal of this dissertation is to show that even at the earliest (non-invasive) recordable stages of auditory cortical processing, we find evidence that cortex is calculating abstract representations from the acoustic signal. Looking across two distinct domains (inferential pitch perception and vowel normalization), I present evidence demonstrating that the M100, an automatic evoked neuromagnetic component that localizes to primary auditory cortex is sensitive to abstract computations. The M100 typically responds to physical properties of the stimulus in auditory and speech perception and integrates only over the first 25 to 40 ms of stimulus onset, providing a reliable dependent measure that allows us to tap into early stages of auditory cortical processing. In Chapter 2, I briefly present the episodicist position on speech perception and discuss research indicating that the strongest episodicist position is untenable. I then review findings from the mismatch negativity literature, where proposals have been made that the MMN allows access into linguistic representations supported by auditory cortex. Finally, I conclude the Chapter with a discussion of the previous findings on the M100/N1. In Chapter 3, I present neuromagnetic data showing that the re-sponse properties of the M100 are sensitive to the missing fundamental component using well-controlled stimuli. These findings suggest that listeners are reconstructing the inferred pitch by 100 ms after stimulus onset. In Chapter 4, I propose a novel formant ratio algorithm in which the third formant (F3) is the normalizing factor. The goal of formant ratio proposals is to provide an explicit algorithm that successfully "eliminates" speaker-dependent acoustic variation of auditory vowel tokens. Results from two MEG experiments suggest that auditory cortex is sensitive to formant ratios and that the perceptual system shows heightened sensitivity to tokens located in more densely populated regions of the vowel space. In Chapter 5, I report MEG results that suggest early auditory cortical processing is sensitive to violations of a phonological constraint on sound sequencing, suggesting that listeners make highly specific, knowledge-based predictions about rather abstract anticipated properties of the upcoming speech signal and violations of these predictions are evident in early cortical processing

    On the causes of compensation for coarticulation : evidence for phonological mediation

    Get PDF
    This study examined whether compensation for coarticulation in fricative-vowel syllables is phonologically mediated or a consequence of auditory processes. Smits (2001a) had shown that compensation occurs for anticipatory lip rounding in a fricative caused by a following rounded vowel in Dutch. In a first experiment, the possibility that compensation is due to general auditory processing was investigated using nonspeech sounds. These did not cause context effects akin to compensation for coarticulation, although nonspeech sounds influenced speech sound identification in an integrative fashion. In a second experiment, a possible phonological basis for compensation for coarticulation was assessed by using audiovisual speech. Visual displays, which induced the perception of a rounded vowel, also influenced compensation for anticipatory lip rounding in the fricative. These results indicate that compensation for anticipatory lip rounding in fricative-vowel syllables is phonologically mediated. This result is discussed in the light of other compensation-for-coarticulation findings and general theories of speech perception.peer-reviewe

    Segmental Durations of Speech

    Get PDF
    This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.Siirretty Doriast

    The role of native-language knowledge in the perception of casual speech in a second language

    Get PDF
    Casual speech processes, such as /t/-reduction, makeword recognition harder. Additionally, word recognition is also harder in a second language (L2). Combining these challenges, we investigated whether L2 learners have recourse to knowledge from their native language (L1) when dealing with casual speech processes in their L2. In three experiments, production and perception of /t/-reduction was investigated. An initial production experiment showed that /t/-reduction occurred in both languages and patterned similarly in proper nouns but differed when /t/ was a verbal inflection. Two perception experiments compared the performance of German learners of Dutch with that of native speakers for nouns and verbs. Mirroring the production patterns, German learners’ performance strongly resembled that of native Dutch listeners when the reduced /t/ was part of a word stem, but deviated where /t/ was a verbal inflection. These results suggest that a casual speech process in a second language is problematic for learners when the process is not known from the leaner’s native language, similar to what has been observed for phoneme contrasts.peer-reviewe

    The use of acoustic cues in phonetic perception: Effects of spectral degradation, limited bandwidth and background noise

    Get PDF
    Hearing impairment, cochlear implantation, background noise and other auditory degradations result in the loss or distortion of sound information thought to be critical to speech perception. In many cases, listeners can still identify speech sounds despite degradations, but understanding of how this is accomplished is incomplete. Experiments presented here tested the hypothesis that listeners would utilize acoustic-phonetic cues differently if one or more cues were degraded by hearing impairment or simulated hearing impairment. Results supported this hypothesis for various listening conditions that are directly relevant for clinical populations. Analysis included mixed-effects logistic modeling of contributions of individual acoustic cues for various contrasts. Listeners with cochlear implants (CIs) or normal-hearing (NH) listeners in CI simulations showed increased use of acoustic cues in the temporal domain and decreased use of cues in the spectral domain for the tense/lax vowel contrast and the word-final fricative voicing contrast. For the word-initial stop voicing contrast, NH listeners made less use of voice-onset time and greater use of voice pitch in conditions that simulated high-frequency hearing impairment and/or masking noise; influence of these cues was further modulated by consonant place of articulation. A pair of experiments measured phonetic context effects for the "s/sh" contrast, replicating previously observed effects for NH listeners and generalizing them to CI listeners as well, despite known deficiencies in spectral resolution for CI listeners. For NH listeners in CI simulations, these context effects were absent or negligible. Audio-visual delivery of this experiment revealed enhanced influence of visual lip-rounding cues for CI listeners and NH listeners in CI simulations. Additionally, CI listeners demonstrated that visual cues to gender influence phonetic perception in a manner consistent with gender-related voice acoustics. All of these results suggest that listeners are able to accommodate challenging listening situations by capitalizing on the natural (multimodal) covariance in speech signals. Additionally, these results imply that there are potential differences in speech perception by NH listeners and listeners with hearing impairment that would be overlooked by traditional word recognition or consonant confusion matrix analysis

    An examination of oral articulation of vowel nasality in the light of the independent effects of nasalization on vowel quality

    Get PDF
    In this paper, a summary is given of an experimental technique to address a known issue in research on the independent effects of nasalization on vowel acoustics: given that the separate transfer functions associated with the oral and nasal cavities are merged in the acoustic signal, the task of teasing apart the respective effects of the two cavities seems to be an intractable problem. The results obtained from the method reveal that the independent effects of nasalization on the acoustic vowel space are: F1-raising for high vowels, F1-lowering for non-high vowels, and F2-lowering for non-front vowels. The results from previous articulatory research performed by the author on the production of vowel nasality in French, Hindi, and English are discussed in the light of these independent effects of nasalization on vowel quality

    The perceptual distance between vowels: the effects of prototypicality and extremity

    Get PDF

    Phonetic Contrast in New York Hasidic Yiddish Vowels: Language Contact, Variation, and Change

    Full text link
    This study analyzes the acoustic correlates of the length contrast in New York Hasidic Yiddish (HY) peripheral vowels /i/, /u/, and /a/, and compares them across four generations of native speakers for evidence of change over time. HY vowel tokens are also compared to English vowels produced by the New York-born speakers to investigate the influence of language contact on observed changes. Additionally, the degree to which individual speakers orient towards or away from the Hasidic community is quantified via an ethnographically informed survey to examine its correlation with /u/-fronting, a sound change that is widespread in the non-Hasidic English-speaking community. The data for this study consist of audio segments extracted from sociolinguistic interviews with fifty-seven New York-born speakers representing three generations; and from recordings of Holocaust testimonies by thirteen survivors from the Transcarpathian region of Eastern Europe, the ancestral homeland of most contemporary Hasidim. The duration and first and second formant frequencies of the vowels were extracted and analyzed statistically. The results show that while the contrast among European-born (first generation) speakers is relatively weak overall, there is a significant increase in both the durational and qualitative distinctions of the long-short counterparts of the high vowel pairs (/i/ and /u/) between the first and second generations. These vowels continue to diverge in quality across subsequent generations, with the short vowels becoming lower and more centralized in phonetic space. Based on these findings, I hypothesize that the length contrast in the pre-war Yiddish of the Transcarpathian region was changing and possibly on the verge of collapse. In the high vowels, contact with English reversed or inhibited a merger, with a remapping of length differences on a quality plus quantity dimension parallel to American English {/i/-/ɪ/} and {/u/-/ʊ/}. However, contact did not have the same effect on the low vowels, since there was no parallel low vowel contrast with which inherited HY {/aː/-/a/} could be associated. Furthermore, a cross-linguistic comparison of the HY vs. English vowel systems shows that while the short high vowels of second-generation speakers are more centralized relative to their HY counterparts, younger speakers exhibit increasing convergence of their HY and English vowels. These results are interpreted with reference to models of second language acquisition, emphasizing differences in language input that might result in the acquisition of different systems. Moreover, the patterns uncovered in the cross-linguistic analysis suggest that contact-induced phonetic drift may account for the changes observed in HY. Finally, there is evidence that /u/ is fronting in post-coronal contexts. However, unlike the changes in the short high vowels, this change is not correlated with generation. Rather, statistical modeling shows a significant effect of Hasidic orientation, with outwardly oriented individuals showing a greater tendency for /u/-fronting than those who are maximally oriented towards the Hasidic community. HY is an organically developing dialect caught between the opposing pressures of a traditionalist religio-cultural ideology that supports it and a majority language that competes with it. This study identifies some of the cognitive forces that may underlie sound change in a minority language under bilingual contact and uncovers locally significant factors that are implicated in the propagation of such change. It also highlights the dynamicity of Hasidic culture and provides linguistic evidence of its interaction with mainstream American culture, thereby presenting an expansive view of the Hasidic community that counters narratives portraying it as anti-progressive and static

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis
    corecore