1,188 research outputs found

    Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets

    Get PDF
    In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of Abstract naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance.Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra- segmental audio features; prosodic features seem to be the most important ones

    Pauses and the temporal structure of speech

    Get PDF
    Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated

    Phonetic Temporal Neural Model for Language Identification

    Get PDF
    Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.Comment: Submitted to TASL

    La percepción del alemán multietnolectal de Zúrich: un continuo más que una categorización neta

    Get PDF
    Since about 2000, the emergence of so-called ‘multiethnolects’ has been observed among adolescents in German-speaking Switzerland; however, a systematic description of these varieties is lacking at present. The few existing perception studies of multiethnolects in other European countries are usually based on two or more predetermined groups that are compared. This paper investigates which labels are used for multiethnolectal Zurich German and how this way of speaking is perceived by adolescents; we adopt a perceptual sociolinguistics approach which focuses on the conceptualizations of lay people rather than on those of linguists. In a rating experiment, 40 adolescents listened to short speech samples of 48 pupils recorded in two different schools in the city of Zurich and were asked to rate the speakers on a 7-point Likert scale according to how multiethnolectal they sounded (not at all – very strongly). The results yielded a perceptual continuum rather than a clear-cut binary categorization [±multiethnolectal]. A smaller follow-up experiment with 12 adult raters (using the same stimuli) yielded a highly significant correlation between the mean rating scores of the two groups of raters.A partir del año 2000 aproximadamente se ha observado la aparición de los llamados ‘multietnolectos’ en la Suiza de habla alemana. Sin embargo, hasta el momento no existe una descripción sistemática de estas variedades lingüísticas. Los escasos estudios de percepción que se han realizado en otros países europeos se basan por lo general en la comparación de dos o más grupos preestablecidos. Este artículo investiga qué términos se utilizan para denominar el multietnolecto hablado en Zúrich y cómo un grupo adolescentes califican esta manera de hablar, adoptando, pues, un enfoque de sociolingüística perceptiva que se centra en las representaciones de locutores comunes más que de lingüistas. En un experimento de percepción, cuarenta adolescentes escucharon breves muestras de habla producidas por 48 escolares que habían sido grabadas en dos escuelas de la ciudad de Zúrich. La tarea de los oyentes consistió en calificar a los locutores con una escala de Likert de siete puntos según cómo de multietnolectal sonaba su habla (no en absoluto – muy fuertemente). Los resultados revelan la existencia de un continuum perceptivo más que de una categorización binaria [±multietnolectal]. El mismo experimento se realizó también con ocho oyentes adultos, obteniendo una correlación muy fuerte y altamente significativa con los valores de los oyentes adolescentes. Estos resultados sugieren que en la percepción del multietnolecto alemán de Zúrich no parece existir una diferencia entre una perspectiva etic y una perspectiva emic

    Unique Finite Element Modelling of Human Body Inside Accelerating Car to Predict Accelerations and Frequencies at Different Human Segments

    Get PDF
    The comfort level of the human occupant inside a dynamic vehicle is dependent on the level of vibration generated inside the different segments of the human body. Some technologies have been developed to provide the final level of vibration inside an automotive-seated human, but those technologies considered only a specific portion of human segments. In the present work, a unique and comprehensive finite element simulation model was proposed to predict the final level of vibration at different segments of a seated human driver inside a moving car. The main aim of this unique simulation methodology was to replace the time-consuming and expensive real life vibration testing for a car-seated human body, with a non-robust and correctly postured virtual human model in a finite element environment. The output of this research work focused on the vertical accelerations, vertical displacement, and frequency, and the results obtained from this research work were validated through comparison to real life test data and information provided in other similar research works. The validation study showed that this unique simulation methodology can successfully be implemented to anticipate accelerations and frequencies at different points of a car-seated human body in order to optimize human health, comfort, and safety

    Generalization of French and Portuguese plural alternations and initial syllable protection

    Get PDF
    We present two cases of morphophonological alternations in the plural of nouns, one from French and one from Brazilian Portuguese. In both of them, monosyllabic items are protected from right-edge alternations more than polysyllabic items are, an asymmetry we attribute to privileged protection of initial syllables. We implement the analyses of the two languages using constraint-based grammars that take trends learned across the lexicon and predict the treatment of nonce words. Five large-scale nonce word tasks confirm the productivity of the trend in both languages
    • …
    corecore