919 research outputs found

    Asymmetric discrimination of non-speech tonal analogues of vowels

    Full text link
    Published in final edited form as: J Exp Psychol Hum Percept Perform. 2019 February ; 45(2): 285–300. doi:10.1037/xhp0000603.Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences due to the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with non-speech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally-produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with two-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in one or both of these two acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in non-speech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited.Accepted manuscrip

    The effect of coarticulatory resistance and aerodynamic requirements of consonants on syllable organization in Polish

    Get PDF

    The Anatomy of Onomatopoeia

    Get PDF
    Virtually every human faculty engage with imitation. One of the most natural and unexplored objects for the study of the mimetic elements in language is the onomatopoeia, as it implies an imitative-driven transformation of a sound of nature into a word. Notably, simple sounds are transformed into complex strings of vowels and consonants, making difficult to identify what is acoustically preserved in this operation. In this work we propose a definition for vocal imitation by which sounds are transformed into the speech elements that minimize their spectral difference within the constraints of the vocal system. In order to test this definition, we use a computational model that allows recovering anatomical features of the vocal system from experimental sound data. We explore the vocal configurations that best reproduce non-speech sounds, like striking blows on a door or the sharp sounds generated by pressing on light switches or computer mouse buttons. From the anatomical point of view, the configurations obtained are readily associated with co-articulated consonants, and we show perceptual evidence that these consonants are positively associated with the original sounds. Moreover, the pairs vowel-consonant that compose these co-articulations correspond to the most stable syllables found in the knock and click onomatopoeias across languages, suggesting a mechanism by which vocal imitation naturally embeds single sounds into more complex speech structures. Other mimetic forces received extensive attention by the scientific community, such as cross-modal associations between speech and visual categories. The present approach helps building a global view of the mimetic forces acting on language and opens a new venue for a quantitative study of word formation in terms of vocal imitation

    Acoustic analysis of Sindhi speech - a pre-curser for an ASR system

    Get PDF
    The functional and formative properties of speech sounds are usually referred to as acoustic-phonetics in linguistics. This research aims to demonstrate acoustic-phonetic features of the elemental sounds of Sindhi, which is a branch of the Indo-European family of languages mainly spoken in the Sindh province of Pakistan and in some parts of India. In addition to the available articulatory-phonetic knowledge; acoustic-phonetic knowledge has been classified for the identification and classification of Sindhi language sounds. Determining the acoustic features of the language sounds helps to bring together the sounds with similar acoustic characteristics under the name of one natural class of meaningful phonemes. The obtained acoustic features and corresponding statistical results for a particular natural class of phonemes provides a clear understanding of the meaningful phonemes of Sindhi and it also helps to eliminate redundant sounds present in the inventory. At present Sindhi includes nine redundant, three interchanging, three substituting, and three confused pairs of consonant sounds. Some of the unique acoustic-phonetic features of Sindhi highlighted in this study are determining the acoustic features of the large number of the contrastive voiced implosives of Sindhi and the acoustic impact of the language flexibility in terms of the insertion and digestion of the short vowels in the utterance. In addition to this the issue of the presence of the affricate class of sounds and the diphthongs in Sindhi is addressed. The compilation of the meaningful language phoneme set by learning their acoustic-phonetic features serves one of the major goals of this study; because twelve such sounds of Sindhi are studied that are not yet part of the language alphabet. The main acoustic features learned for the phonological structures of Sindhi are the fundamental frequency, formants, and the duration — along with the analysis of the obtained acoustic waveforms, the formant tracks and the computer generated spectrograms. The impetus for doing such research comes from the fact that detailed knowledge of the sound characteristics of the language-elements has a broad variety of applications — from developing accurate synthetic speech production systems to modeling robust speaker-independent speech recognizers. The major research achievements and contributions this study provides in the field include the compilation and classification of the elemental sounds of Sindhi. Comprehensive measurement of the acoustic features of the language sounds; suitable to be incorporated into the design of a Sindhi ASR system. Understanding of the dialect specific acoustic variation of the elemental sounds of Sindhi. A speech database comprising the voice samples of the native Sindhi speakers. Identification of the language‘s redundant, substituting and interchanging pairs of sounds. Identification of the language‘s sounds that can potentially lead to the segmentation and recognition errors for a Sindhi ASR system design. The research achievements of this study create the fundamental building blocks for future work to design a state-of-the-art prototype, which is: gender and environment independent, continuous and conversational ASR system for Sindhi

    Speech Communication

    Get PDF
    Contains reports on seven research projects.Contract AF19(604)-2061 with Air Force Cambridge Research CenterContract N5ori-07861 with the Navy (Office of Naval Research)National Science Foundatio

    Syllables without vowels: Phonetic and phonological evidence from Tashlhiyt Berber

    Get PDF
    International audienceIt has been proposed that Tashlhiyt is a language which allows any segment,including obstruents, to be a syllable nucleus. The most striking and controversialexamples taken as arguments in favour of this analysis involve series of wordsclaimed to contain only obstruents. This claim is disputed in some recent work,where it is argued that these consonant sequences contain schwas that can besyllable nuclei. This article presents arguments showing that vowelless syllablesdo exist in Tashlhiyt, both at the phonetic and phonological levels. Acoustic,fibrescopic and photoelectroglottographic examination of voiceless words (e.g.[tkkststt]) provide evidence that such items lack syllabic vocalic elements. In addition,two types of phonological data, metrics and a spirantisation process, arepresented to show that in this language schwa is not a segment which can beindependently manipulated by phonological grammar and which can be referredto the syllable structure

    Experimental study of nasality with particular reference to Brazilian Portuguese

    Get PDF

    Learning to Pronounce First Words in Three Languages: An Investigation of Caregiver and Infant Behavior Using a Computational Model of an Infant

    Get PDF
    Words are made up of speech sounds. Almost all accounts of child speech development assume that children learn the pronunciation of first language (L1) speech sounds by imitation, most claiming that the child performs some kind of auditory matching to the elements of ambient speech. However, there is evidence to support an alternative account and we investigate the non-imitative child behavior and well-attested caregiver behavior that this account posits using Elija, a computational model of an infant. Through unsupervised active learning, Elija began by discovering motor patterns, which produced sounds. In separate interaction experiments, native speakers of English, French and German then played the role of his caregiver. In their first interactions with Elija, they were allowed to respond to his sounds if they felt this was natural. We analyzed the interactions through phonemic transcriptions of the caregivers' utterances and found that they interpreted his output within the framework of their native languages. Their form of response was almost always a reformulation of Elija's utterance into well-formed sounds of L1. Elija retained those motor patterns to which a caregiver responded and formed associations between his motor pattern and the response it provoked. Thus in a second phase of interaction, he was able to parse input utterances in terms of the caregiver responses he had heard previously, and respond using his associated motor patterns. This capacity enabled the caregivers to teach Elija to pronounce some simple words in their native languages, by his serial imitation of the words' component speech sounds. Overall, our results demonstrate that the natural responses and behaviors of human subjects to infant-like vocalizations can take a computational model from a biologically plausible initial state through to word pronunciation. This provides support for an alternative to current auditory matching hypotheses for how children learn to pronounce

    Acoustic Characteristics of Word-Final American English Liquids Produced by L2 Adult Speakers

    Get PDF
    In this study, the acoustic differences between native English speakers’ (L1) and native-Korean speakers’ (L2) production of American English liquids /ɹ/, /l/ and /ɹl/ were examined among 14 Korean speakers and 13 English speakers. Temporal measures included (1) relative timing of maximum constriction and (2) duration of vocalic nuclei. Spectral measures included (1) Euclidean distance between /ɹ/ and /l/ and (2) frequency difference between F2 and F3. The results indicated a significant interaction between speaker group and phonetic stimuli. That is, L2 speakers produced a similar degree of constriction across semivowels, whereas L1 speakers produced varying degrees of F2-F3 constrictions across phonetic stimuli. In addition, the relative timing of maximum constriction occurred earliest in /ɹl/ and latest in /ɹ/ production for L1 speakers. The opposite pattern was observed for L2 speakers. Furthermore, the two speaker groups exhibited significantly different results concerning the Euclidean distances between /ɹ/ and /l/. The Euclidean distances between the two sounds were significantly closer for L2 speakers compared to L1 speakers which indicates reduced acoustic distinction between the two liquids in L2 speakers. The same results were revealed for both measurement points, temporal midpoint and maximum constriction point. However, the speaker group difference was more apparent when measured from the point of maximum F2-F3 constriction compared to the temporal midpoint. The findings provide acoustic data on liquid production in L2 speakers and support the use of these measures in a clinical setting
    • …
    corecore