SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation
  for Multi-modal Intent Detection

Huang, Shijue; Qin, Libo; Tu, Geng; Wang, Bingbing; Xu, Ruifeng

SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection

Authors: Shijue Huang
Libo Qin
Geng Tu
Bingbing Wang
Ruifeng Xu
Publication date: 31 December 2023
Publisher

Abstract

Multi-modal intent detection aims to utilize various modalities to understand the user's intentions, which is essential for the deployment of dialogue systems in real-world scenarios. The two core challenges for multi-modal intent detection are (1) how to effectively align and fuse different features of modalities and (2) the limited labeled multi-modal intent training data. In this work, we introduce a shallow-to-deep interaction framework with data augmentation (SDIF-DA) to address the above challenges. Firstly, SDIF-DA leverages a shallow-to-deep interaction module to progressively and effectively align and fuse features across text, video, and audio modalities. Secondly, we propose a ChatGPT-based data augmentation approach to automatically augment sufficient training data. Experimental results demonstrate that SDIF-DA can effectively align and fuse multi-modal features by achieving state-of-the-art performance. In addition, extensive analyses show that the introduced data augmentation approach can successfully distill knowledge from the large language model.Comment: Accepted by ICASSP 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2401.00424

Last time updated on 14/08/2024