9,104 research outputs found

    Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

    Get PDF
    Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

    Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate

    Get PDF
    Second-language (L2) speech is consistently slower than first-language (L1) speech, and L1 speaking rate varies within- and across-talkers depending on many individual, situational, linguistic, and sociolinguistic factors. It is asked whether speaking rate is also determined by a language-independent talker-specific trait such that, across a group of bilinguals, L1 speaking rate significantly predicts L2 speaking rate. Two measurements of speaking rate were automatically extracted from recordings of read and spontaneous speech by English monolinguals (n = 27) and bilinguals from ten L1 backgrounds (n = 86): speech rate (syllables/second), and articulation rate (syllables/second excluding silent pauses). Replicating prior work, L2 speaking rates were significantly slower than L1 speaking rates both across-groups (monolinguals' L1 English vs bilinguals' L2 English), and across L1 and L2 within bilinguals. Critically, within the bilingual group, L1 speaking rate significantly predicted L2 speaking rate, suggesting that a significant portion of inter-talker variation in L2 speech is derived from inter-talker variation in L1 speech, and that individual variability in L2 spoken language production may be best understood within the context of individual variability in L1 spoken language production

    MORPHOPHONOLOGICAL INTERFERENCE IN MINANGKABAU’S LANGUAGE

    Get PDF
    This study analyzes entitle of: “Morphophonological Interference in Minangkabau’s Language”, which focuses on morpheme and phonetic. This study aims to find the forms of word structure and vocal along with the changing of interference between Minangkabau and Indonesian that occurs. This study focuses on Chaer and Agustina’s theory (1994:146, 2004:122) and identified four types of interference into the language, actually the writer focuses on two types they are morphological interference and phonological interference. The methodology uses descriptive qualitative method concerning the data source in minangkabau’s language: Traditional Poetry. The result this research showed that there are 8 interferences of morpheme with varieties of phonetic. This analysis showed Interference as Minangkabau’s language to Indonesian language whereas interference changed either word or sound and overall can be affected by accent, besides sound includes rhyme, intonation, and stress, meanwhile word includes affixation. In addition we should be appreciated regional language and learned to avoid misunderstanding in many languages in this world

    Acquiring a new second language contrast: an analysis of the English laryngeal system of native speakers of Dutch

    Get PDF
    This study examines the acquisition of the English laryngeal system by native speakers of (Belgian) Dutch. Both languages have a two-way laryngeal system, but while Dutch contrasts prevoiced with short-lag stops, English has a contrast between short-lag and long-lag stops. The primary aim of the article is to test two hypotheses on the acquisition process based on first language acquisition research: (1) native speakers of a voicing language will succeed in producing short-lag stops in the target aspirating language, since short-lag stops occur early in first language acquisition and can be considered unmarked and since one member of the contrast is formed by short-lag stops in both voicing and aspirating languages, and (2) native speakers of a voicing language will succeed in acquiring long-lag stops in the target language, because aspiration is an acoustically salient realization. The analysis is based on an examination of natural speech data (conversations between dyads of informants), combined with the results of a controlled reading task. Both types of data were gathered in Dutch as well as in Eng(Dutch) (i.e. the English speech of native speakers of Dutch). The analysis revealed an interesting pattern: while the first language (L1) Dutch speakers were successful in acquiring long-lag aspirated stops (confirming hypothesis 2), they did not acquire English short-lag stops (rejecting hypothesis 1). Instead of the target short-lag stops, the L1 Dutch speakers produced prevoiced stops and frequently transferred regressive voice assimilation with voiced stops as triggers from Dutch into English. Various explanations for this pattern in terms of acoustic salience, perceptual cues and training will be considered

    Sentence Comprehension in Monolingual and Bilingual Children

    Get PDF
    Abstract Bilingual children outperform monolingual children on non-linguistic tasks that tap executive function. It still unknown whether the enhancement of executive functioning found for bilingual children improves complex linguistic comprehension. The present study examined possible differences between monolingual and bilingual childrens sentence comprehension in the presence of different sources of information that conflicted with a correct interpretation.100 children (33monolinguals and 67 bilinguals) between the ages of 4- and 5-years old were examined on two complex linguistic tasks. The findings showed that bilingual children were more accurate than monolingual children in understanding the meaning of the spoken sentences in the presence of distraction. Bilingual childrens advanced attentional control skill has been proposed as a possible cause that led them to effectively focus their attention on the relevant information while ignoring other sources of information that interfered with the correct interpretation. Keyword: Bilingual children, Monolingual children, Executive function, Attentional Control, Sentence comprehensio

    Directional adposition use in English, Swedish and Finnish

    Get PDF
    Directional adpositions such as to the left of describe where a Figure is in relation to a Ground. English and Swedish directional adpositions refer to the location of a Figure in relation to a Ground, whether both are static or in motion. In contrast, the Finnish directional adpositions edellä (in front of) and jäljessä (behind) solely describe the location of a moving Figure in relation to a moving Ground (Nikanne, 2003). When using directional adpositions, a frame of reference must be assumed for interpreting the meaning of directional adpositions. For example, the meaning of to the left of in English can be based on a relative (speaker or listener based) reference frame or an intrinsic (object based) reference frame (Levinson, 1996). When a Figure and a Ground are both in motion, it is possible for a Figure to be described as being behind or in front of the Ground, even if neither have intrinsic features. As shown by Walker (in preparation), there are good reasons to assume that in the latter case a motion based reference frame is involved. This means that if Finnish speakers would use edellä (in front of) and jäljessä (behind) more frequently in situations where both the Figure and Ground are in motion, a difference in reference frame use between Finnish on one hand and English and Swedish on the other could be expected. We asked native English, Swedish and Finnish speakers’ to select adpositions from a language specific list to describe the location of a Figure relative to a Ground when both were shown to be moving on a computer screen. We were interested in any differences between Finnish, English and Swedish speakers. All languages showed a predominant use of directional spatial adpositions referring to the lexical concepts TO THE LEFT OF, TO THE RIGHT OF, ABOVE and BELOW. There were no differences between the languages in directional adpositions use or reference frame use, including reference frame use based on motion. We conclude that despite differences in the grammars of the languages involved, and potential differences in reference frame system use, the three languages investigated encode Figure location in relation to Ground location in a similar way when both are in motion. Levinson, S. C. (1996). Frames of reference and Molyneux’s question: Crosslingiuistic evidence. In P. Bloom, M.A. Peterson, L. Nadel & M.F. Garrett (Eds.) Language and Space (pp.109-170). Massachusetts: MIT Press. Nikanne, U. (2003). How Finnish postpositions see the axis system. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space. Oxford, UK: Oxford University Press. Walker, C. (in preparation). Motion encoding in language, the use of spatial locatives in a motion context. Unpublished doctoral dissertation, University of Lincoln, Lincoln. United Kingdo

    Asymmetric discrimination of non-speech tonal analogues of vowels

    Full text link
    Published in final edited form as: J Exp Psychol Hum Percept Perform. 2019 February ; 45(2): 285–300. doi:10.1037/xhp0000603.Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences due to the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with non-speech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally-produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with two-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in one or both of these two acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in non-speech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited.Accepted manuscrip

    Exploring the prosodic and syntactic aspects of Mandarin-English Code switching

    Full text link
    L’alternance codique (Code-switching, CS) est l’un des comportements naturels les plus courants chez les bilingues. Les linguistes ont exploré les contraintes derrière l’alternance codique (CS) pour expliquer ce comportement. Au cours des dernières décennies, la recherche a plutôt été axée sur les contraintes syntaxiques et ce n’est que récemment que les contraintes prosodiques ont commencé à attirer l’attention des linguistes. Puisque la paire de langues choisie est moins étudiée dans le domaine de recherche sur la CS, les études sur la CS mandarin-anglais sont limitées en ce qui concerne les deux contraintes. Ainsi, cette étude explore à la fois les contraintes prosodiques et les schémas syntaxiques de cette paire de langues grâce à une base de données naturelle sur l’alternance codique. Prosodiquement, l’étude applique une approche fondée sur l’information (information-based approach) et utilise une unité fondamentale, l’unité d’intonation (Intonation Unit, IU), pour mener l’analyse. Le résultat de 10,6 % d’IU bilingue (BIU) se révèle fiable et offre des preuves solides que l’alternance codique a tendance à avoir lieu aux frontières de l’IU chez les bilingues. Les résultats soutiennent le travail précurseur de Shenk (2006) à partir d’une paire de langues inexplorée (mandarin-anglais). De plus, cette étude développe des solutions au problème de subjectivité et au problème d’adéquation de la base de données afin de renforcer la fiabilité des résultats. D’un point de vue syntaxique, l’étude examine les schémas syntaxiques aux points de CS de la paire de langues mandarin-anglais en utilisant des données recueillies auprès d’une communauté bilingue rarement étudiée. Un schéma syntaxique spécifique à cette paire de langues a été observé en fonction des résultats, mais l’étude suggère que ce schéma ait perturbé les résultats finaux. L’étude comporte une analyse avec les résultats de l’aspect prosodique et de l’aspect syntaxique. Lorsque les résultats divergents sont éliminés, on peut observer un résultat plus solide qui soutient davantage l’argument de la contrainte prosodique.Code-switching (CS) is one of the most common natural behaviors among bilinguals. Linguists have been exploring the constraints behind CS to explain this behaviour, and while syntactic constraints have been the focus for decades, prosodic constraints were only studied more in depth recently. As a less common language pair in CS research, studies on Mandarin-English CS are limited for both constraints. Thus, this study explores the prosodic constraints and syntactic patterns of this language pair with a natural CS database. Prosodically, this study applies the information-based approach and its fundamental unit, Intonation Unit (IU), to conduct the analysis. The result of 10.6% bilingual IU (BIU) proves to be reliable and offers solid evidence that bilinguals tend to code-switch at IU boundaries. This supports the pioneer work of Shenk (2006) from the unexplored Mandarin-English language pair. In addition to this, the study develops solutions to deal with the subjectivity problem and the database appropriateness problem in this approach to strengthen the validity of the results. Syntactically, this study investigates the syntactic patterns at switching points on the Mandarin-English language pair using data collected from a rarely investigated bilingual community. Based on the results, a syntactic pattern specific to this language pair was observed and this study suggests it disrupted the final results. This study conducts an analysis with the results of both the prosodic aspect and the syntactic aspect. When the interfering results are eliminated, a more solid outcome can be observed which provides greater support to the prosodic constraint argument

    Speaker recognition across languages

    Full text link
    This is a draft of a chapter/article that has been accepted for publication by Oxford University Press in the forthcoming book The Oxford Handbook of Voice Perception, edited by S. FrĂĽhholz & P. Belin due for publication (in press).Listeners identify voices more accurately in their native language than an unknown, foreign language, in a phenomenon known as the language-familiarity effect in talker identification. This effect has been reliably observed for a wide range of different language pairings and using a variety of different methodologies, including voice line-ups, talker identification training, and talker discrimination. What do listeners know about their native language that helps them recognize voices more accurately? Do listeners gain access to this knowledge when they learn a second language? Is linguistic competence necessary, or can mere exposure to a foreign language help listeners identify voices more accurately? In this chapter, I review the more than three decades of research on the language-familiarity effect in talker identification with an emphasis on how attention to this phenomenon can inform not only better psychological and neural models of memory for voices, but also better models of speech processing.This work was supported in part by NIH R03 DC014045
    • …
    corecore