280 research outputs found

    Rhythmic unit extraction and modelling for automatic language identification

    Get PDF
    International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

    The effect of speech rhythm and speaking rate on assessment of pronunciation in a second language

    Get PDF
    Published online: 24 April 2019The study explores the effect of deviations from native speech rhythm and rate norms on the assessement of pronunciation mastery of a second language (L2) when the native language of the learner is either rhythmically similar to or different from the target language. Using the concatenative speech synthesis technique, different versions of the same sentence were created in order to produce segmentally and intonationally identical utterances that differed only in rhythmic patterns and/or speaking rate. Speech rhythm and tempo patterns modeled those from the speech of French or German native learners of English at different proficiency levels. Native British English speakers rated the original sentences and the synthesized utterances for accentedness. The analysis shows that (a) differences in speech rhythm and speaking tempo influence the perception of accentedness; (b) idiosyncratic differences in speech rhythm and speech rate are sufficient to differentiate between the proficiency levels of L2 learners; (c) the relative salience of rhythm and rate on perceived accentedness in L2 speech is modulated by the native language of the learners; and (d) intonation facilitates the perception of finer differences in speech rhythm between otherwise identical utterances. These results emphasize the importance of prosodic timing patterns for the perception of speech delivered by L2 learners.L.P. was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) via Juan de la Cierva fellowship. M.O. was supported by the IKERBASQUE–Basque Foundation for Science. The research institution was supported through the “Severo Ochoa” Programme for Centres/Units of Excellence in R&D (SEV-2015-490)

    의사결정 모델을 이용한 음소 분류과제의 선택과 반응시간 예측

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 뇌인지과학과, 2013. 2. 이상훈.Despite crucial roles of pre-lexical units in speech perception, modeling efforts so far have been heavily focused on information processing at lexical or post-lexical stages, impeding the mechanistic investigation of speech perception. Given this dearth of frameworks for studying pre-lexical units, the current study proposes a system-level neural model for phoneme classification. A lynchpin idea behind the proposed model is that the brain represents phonemes as probabilistic quantities, likelihoods. With this idea, our model bridges three well-known canonical computations in the brain – sensory encoding, likelihood decoding and evidence accumulation - along a cascade hierarchy of neural processing towards generating inputs to a next stage of speech perception. At the initial stage, sensory neurons with different tuning curves for physical properties relevant to phoneme discrimination compute individual likelihoods for the presence of those properties. Phoneme neurons at the following stage compute likelihoods for specific phonemes by summing the outputs of those sensory encoding neurons with weighting curves tuned for their preferred phonemes. At the final stage, evidence-accumulation neurons compute and accumulate over time evidence to reach a discrete phoneme classification by integrating outputs of phoneme neurons in a task-optimal manner over time. The accumulation-to-bound mechanism operating at this stage translates probabilistic information represented in the phoneme neurons output into concrete choices at a certain time. This translation allowed us to test the empirical viability of our model by assessing its capability of predicting actual patterns of choice fractions and reaction times exhibited by human listeners engaging in phoneme classification under various listening conditions. Using a small number of parameters, the model predicted not only the static, categorical structure of phoneme classification as a function of physical stimulus property, but also the adaptation-induced, dynamic changes in classification on an identical stimulus. Furthermore, the model was flexible enough to cover the wide range of individual differences in phoneme classification behavior. With these behavioral constraints in conjunction with neural and computational constraints exercised in model construction, our model provides a framework for studying neural mechanisms underlying initial stages of speech processing by generating hypotheses and predictions that are testable by neurophysiological and behavioral experiments.1. Introduction 1 1.1. Likelihood model of Phoneme Classification 5 2. Phoneme classification on a cyclic spectrum of stimuli varying in frequency modulation 9 2.1 Methods 9 2.2 Results 12 3. Dynamic changes in phoneme representation following adaptation 15 3.1 Methods 15 3.2 Results 23 4. Simultaneous fit of the likelihood model to phoneme classification responses with and without adaptation 24 5. Discussion 32 6. References 35Maste

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    corecore