6 research outputs found
多重分解能のポステリオグラムを用いた日本人英語を対 象とした流暢性推定と韻律誤り分析
学位の種別: 修士University of Tokyo(東京大学
Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Recent studies on pronunciation scoring have explored the effect of
introducing phone embeddings as reference pronunciation, but mostly in an
implicit manner, i.e., addition or concatenation of reference phone embedding
and actual pronunciation of the target phone as the phone-level pronunciation
quality representation. In this paper, we propose to use linguistic-acoustic
similarity to explicitly measure the deviation of non-native production from
its native reference for pronunciation assessment. Specifically, the deviation
is first estimated by the cosine similarity between reference phone embedding
and corresponding acoustic embedding. Next, a phone-level Goodness of
pronunciation (GOP) pre-training stage is introduced to guide this
similarity-based learning for better initialization of the aforementioned two
embeddings. Finally, a transformer-based hierarchical pronunciation scorer is
used to map a sequence of phone embeddings, acoustic embeddings along with
their similarity measures to predict the final utterance-level score.
Experimental results on the non-native databases suggest that the proposed
system significantly outperforms the baselines, where the acoustic and phone
embeddings are simply added or concatenated. A further examination shows that
the phone embeddings learned in the proposed approach are able to capture
linguistic-acoustic attributes of native pronunciation as reference.Comment: Accepted by ICASSP 202
母語話者シャドーイングに基づく非母語話者音声の可解性自動計測に関する研究
学位の種別: 修士University of Tokyo(東京大学
An Acoustic Study of Emotional Speech Produced by Italian Learners of Japanese
Pioneering research on L2 emotional speech provides evidence for crosslinguistic similarities and differences, but there is limited research at the production level. This study examines the production of emotional speech in both L1 and L2 of Italian learners of Japanese and L1 Japanese. We sought to find the acoustic characteristics of emotional speech for this L2-type and investigate transfer effects. We analyzed single- word utterances with five emotions (neutral, happy, angry, sad, and surprised) for six acoustic parameters (F0mean, F0max, F0min, F0range, intensity, and utterance duration). Asymmetric patterns are found for different emotions. For happy and surprised, all speech types show higher pitch and larger pitch range with respect to neutral: i.e., positive transfer. In contrast, for angry, we found only larger pitch range in L1 Japanese, but higher F0mean and F0max and larger pitch range in L1 Italian; for sad, lower F0 level and smaller pitch range in L1 Japanese, but higher F0mean, F0max, and F0min and smaller intensity in L1 Italian. For both angry and sad, L2 Japanese shows F0 patterns similar to those of L1 Italian: i.e., negative transfer. The findings are discussed in light of valence and arousal features of emotions and other directions for future research