6 research outputs found

    多重分解能のポステリオグラムを用いた日本人英語を対 象とした流暢性推定と韻律誤り分析

    Get PDF
    学位の種別: 修士University of Tokyo(東京大学

    Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

    Full text link
    Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly measure the deviation of non-native production from its native reference for pronunciation assessment. Specifically, the deviation is first estimated by the cosine similarity between reference phone embedding and corresponding acoustic embedding. Next, a phone-level Goodness of pronunciation (GOP) pre-training stage is introduced to guide this similarity-based learning for better initialization of the aforementioned two embeddings. Finally, a transformer-based hierarchical pronunciation scorer is used to map a sequence of phone embeddings, acoustic embeddings along with their similarity measures to predict the final utterance-level score. Experimental results on the non-native databases suggest that the proposed system significantly outperforms the baselines, where the acoustic and phone embeddings are simply added or concatenated. A further examination shows that the phone embeddings learned in the proposed approach are able to capture linguistic-acoustic attributes of native pronunciation as reference.Comment: Accepted by ICASSP 202

    母語話者シャドーイングに基づく非母語話者音声の可解性自動計測に関する研究

    Get PDF
    学位の種別: 修士University of Tokyo(東京大学

    An Acoustic Study of Emotional Speech Produced by Italian Learners of Japanese

    Get PDF
    Pioneering research on L2 emotional speech provides evidence for crosslinguistic similarities and differences, but there is limited research at the production level. This study examines the production of emotional speech in both L1 and L2 of Italian learners of Japanese and L1 Japanese. We sought to find the acoustic characteristics of emotional speech for this L2-type and investigate transfer effects. We analyzed single- word utterances with five emotions (neutral, happy, angry, sad, and surprised) for six acoustic parameters (F0mean, F0max, F0min, F0range, intensity, and utterance duration). Asymmetric patterns are found for different emotions. For happy and surprised, all speech types show higher pitch and larger pitch range with respect to neutral: i.e., positive transfer. In contrast, for angry, we found only larger pitch range in L1 Japanese, but higher F0mean and F0max and larger pitch range in L1 Italian; for sad, lower F0 level and smaller pitch range in L1 Japanese, but higher F0mean, F0max, and F0min and smaller intensity in L1 Italian. For both angry and sad, L2 Japanese shows F0 patterns similar to those of L1 Italian: i.e., negative transfer. The findings are discussed in light of valence and arousal features of emotions and other directions for future research
    corecore