6 research outputs found
Short Utterance Dialogue Act Classification Using a Transformer Ensemble
An influx of digital assistant adoption and reliance is demonstrating the significance of reliable and robust dialogue act classification techniques. In the literature, there is an over-representation of purely lexical-based dialogue act classification methods. A weakness of this approach is the lack of context when classifying short utterances. We improve upon a purely lexical approach by incorporating a state-of-the-art acoustic model in a lexical-acoustic transformer ensemble, with improved results, when classifying dialogue acts in the MRDA corpus. Additionally, we further investigate the performance on an utterance word-count basis, showing classification accuracy increases with utterance word count. Furthermore, the performance of the lexical model increases with utterance word length and the acoustic model performance decreases with utterance word count, showing the models complement each other for different utterance lengths
Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions
This paper presents our latest investigation on modeling backchannel in
conversations. Motivated by a proactive backchanneling theory, we aim at
developing a system which acts as a proactive listener by inserting
backchannels, such as continuers and assessment, to influence speakers. Our
model takes into account not only lexical and acoustic cues, but also
introduces the simple and novel idea of using listener embeddings to mimic
different backchanneling behaviours. Our experimental results on the
Switchboard benchmark dataset reveal that acoustic cues are more important than
lexical cues in this task and their combination with listener embeddings works
best on both, manual transcriptions and automatically generated transcriptions.Comment: Published in ICASSP 202