2 research outputs found
Linguistically Aided Speaker Diarization Using Speaker Role Information
Speaker diarization relies on the assumption that speech segments
corresponding to a particular speaker are concentrated in a specific region of
the speaker space; a region which represents that speaker's identity. These
identities are not known a priori, so a clustering algorithm is typically
employed, which is traditionally based solely on audio. Under noisy conditions,
however, such an approach poses the risk of generating unreliable speaker
clusters. In this work we aim to utilize linguistic information as a
supplemental modality to identify the various speakers in a more robust way. We
are focused on conversational scenarios where the speakers assume distinct
roles and are expected to follow different linguistic patterns. This distinct
linguistic variability can be exploited to help us construct the speaker
identities. That way, we are able to boost the diarization performance by
converting the clustering task to a classification one. The proposed method is
applied in real-world dyadic psychotherapy interactions between a provider and
a patient and demonstrated to show improved results.Comment: from v1: restructured Introduction and Background, added experimental
results with ASR text and language-only baselin