2 research outputs found
Tonal identification in whispered speech
This project aims to examine whether, and how, non-F0 cues facilitate the identification of lexical tones. A perception experiment is designed to explicitly test the impact of duration cues for Mandarin lexical tones when F0 is absent. We take a novel approach in which the secondary cue of interest is held constant, effectively controlling the type of information listeners receive. Future studies can potentially extend this methodology to examine other relevant cues, such as temporal envelope and intensity. The contribution of this paper is twofold: first, to propose an explanation for the inconsistent conclusions drawn in the literature on tonal identification in whispered speech; second, to devise a more well-controlled study shedding light on the nature of tonal perception
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Automatic Speech Recognition (ASR) in medical contexts has the potential to
save time, cut costs, increase report accuracy, and reduce physician burnout.
However, the healthcare industry has been slower to adopt this technology, in
part due to the importance of avoiding medically-relevant transcription
mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR
metric that penalizes clinically-relevant mistakes more than others. We
demonstrate that this metric more closely aligns with clinician preferences on
medical sentences as compared to other metrics (WER, BLUE, METEOR, etc),
sometimes by wide margins. We collect a benchmark of 13 clinician preferences
on 149 realistic medical sentences called the Clinician Transcript Preference
benchmark (CTP), demonstrate that CBERTScore more closely matches what
clinicians prefer, and release the benchmark for the community to further
develop clinically-aware ASR metrics