3 research outputs found

    Automatic Pronunciation Assessment -- A Review

    Full text link
    Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

    Perceptual chunking of spontaneous speech : Validating a new method with non-native listeners

    Get PDF
    Human perception relies on chunking up an incoming information stream into smaller units to make sense of it. Evidence of chunking has been found across different domains, including visual events, music, and dance movement. It is largely uncontested that language processing must also proceed in smaller chunks of some kind. What these online chunks consist in is much less understood. In this paper, we propose that cognitively relevant chunks can be identified by crowdsourcing listener perceptions of chunk boundaries in real-time speech, even if the listeners are non-native speakers of the language. We present a paradigm in which experiment participants simultaneously listen to short extracts of authentic speech and mark chunk boundaries using a custom-built tablet application. We then test the internal validity of the method by measuring the extent to which fluent L2 listeners agree on chunk boundaries. To do this, we use three datasets collected within the paradigm and a suite of different statistical methods. The external validity of the method is studied in a separate paper and is briefly discussed at the end.Peer reviewe

    Sound, structure and meaning : The bases of prominence ratings in English, French and Spanish

    Get PDF
    This study tests the influence of acoustic cues and non-acoustic contextual factors on listeners’ perception of prominence in three languages whose prominence systems differ in the phonological patterning of prominence and in the association of prominence with information structure—English, French and Spanish. Native speakers of each language performed an auditory rating task to mark prominent words in samples of conversational speech under two instructions: with prominence defined in terms of acoustic or meaning-related criteria. Logistic regression models tested the role of task instruction, acoustic cues and non-acoustic contextual factors in predicting binary prominence ratings of individual listeners. In all three languages we find similar effects of prosodic phrase structure and acoustic cues (F0, intensity, phone-rate) on prominence ratings, and differences in the effect of word frequency and instruction. In English, where phrasal prominence is used to convey meaning related to information structure, acoustic and meaning criteria converge on very similar prominence ratings. In French and Spanish, where prominence plays a lesser role in signaling information structure, phrasal prominence is perceived more narrowly on structural and acoustic grounds. Prominence ratings from untrained listeners correspond with ToBI pitch accent labels for each language. Distinctions in ToBI pitch accent status (nuclear, prenuclear, unaccented) are reflected in empirical and model-predicted prominence ratings. In addition, words with a ToBI pitch accent type that is typically associated with contrastive focus are more likely to be rated as prominent in Spanish and English, but no such effect is found for French. These findings are discussed in relation to probabilistic models of prominence production and perception.Peer reviewe
    corecore