Search CORE

6 research outputs found

Short Utterance Dialogue Act Classification Using a Transformer Ensemble

Author: Cannings N.
Cannings N.
Glackin C.
Glackin C.
Goodluck Constance T.
Goodluck Constance T.
Maltby H.
Maltby H.
Moniri M.
Moniri M.
Rajwadi M.
Rajwadi M.
Wall J.
Wall J.
Publication venue
Publication date: 01/01/2023
Field of study

An influx of digital assistant adoption and reliance is demonstrating the significance of reliable and robust dialogue act classification techniques. In the literature, there is an over-representation of purely lexical-based dialogue act classification methods. A weakness of this approach is the lack of context when classifying short utterances. We improve upon a purely lexical approach by incorporating a state-of-the-art acoustic model in a lexical-acoustic transformer ensemble, with improved results, when classifying dialogue acts in the MRDA corpus. Additionally, we further investigate the performance on an utterance word-count basis, showing classification accuracy increases with utterance word count. Furthermore, the performance of the lexical model increases with utterance word length and the acoustic model performance decreases with utterance word count, showing the models complement each other for different utterance lengths

UEL Research Repository at University of East London

Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

Author: Li Chia-Yu
Ortega Daniel
Vu Ngoc Thang
Publication venue
Publication date: 10/04/2023
Field of study

This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at developing a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of using listener embeddings to mimic different backchanneling behaviours. Our experimental results on the Switchboard benchmark dataset reveal that acoustic cues are more important than lexical cues in this task and their combination with listener embeddings works best on both, manual transcriptions and automatically generated transcriptions.Comment: Published in ICASSP 202

arXiv.org e-Print Archive