Search CORE

5 research outputs found

Improving automated segmentation of radio shows with audio embeddings

Author: Berlage Oberon
Graus David
Lux Klaus-Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/02/2020
Field of study

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.Comment: 5 pages, 2 figures, submitted to ICASSP202

arXiv.org e-Print Archive

Crossref

Topic Break Detection in Interview Dialogues Using Sentence Embedding of Utterance and Speech Intention Based on Multitask Neural Networks

Author: Kirihara Taiga
Matsumoto Kazuyuki
Sasayama Manabu
Publication venue: 'MDPI AG'
Publication date: 17/01/2022
Field of study

Currently, task-oriented dialogue systems that perform specific tasks based on dialogue are widely used. Moreover, research and development of non-task-oriented dialogue systems are also actively conducted. One of the problems with these systems is that it is difficult to switch topics naturally. In this study, we focus on interview dialogue systems. In an interview dialogue, the dialogue system can take the initiative as an interviewer. The main task of an interview dialogue system is to obtain information about the interviewee via dialogue and to assist this individual in understanding his or her personality and strengths. In order to accomplish this task, the system needs to be flexible and appropriate for detecting topic switching and topic breaks. Given that topic switching tends to be more ambiguous in interview dialogues than in task-oriented dialogues, existing topic modeling methods that determine topic breaks based only on relationships and similarities between words are likely to fail. In this study, we propose a method for detecting topic breaks in dialogue to achieve flexible topic switching in interview dialogue systems. The proposed method is based on multi-task learning neural network that uses embedded representations of sentences to understand the context of the text and utilizes the intention of an utterance as a feature. In multi-task learning, not only topic breaks but also the intention associated with the utterance and the speaker are targets of prediction. The results of our evaluation experiments show that using utterance intentions as features improves the accuracy of topic separation estimation compared to the baseline model

PubMed Central

Tokushima University Institutional Repository

Topic segmentation in ASR transcripts using bidirectional rnns for change detection

Author: Fohr Dominique
Illina Irina
Sheikh Imran
Publication venue: HAL CCSD
Publication date: 16/12/2017
Field of study

International audienceTopic segmentation methods are mostly based on the idea of lexical cohesion, in which lexical distributions are analysed across the document and segment boundaries are marked in areas of low cohesion. We propose a novel approach for topic segmentation in speech recognition transcripts by measuring lexical cohesion using bidirectional Recurrent Neural Networks (RNN). The bidirectional RNNs capture context in the past and the following set of words. The past and following contexts are compared to perform topic change detection. In contrast to existing works based on sequence and discrim-inative models for topic segmentation, our approach does not use a segmented corpus nor (pseudo) topic labels for training. Our model is trained using news articles obtained from the internet. Evaluation on ASR transcripts of French TV broadcast news programs demonstrates the effectiveness of our proposed approach

INRIA a CCSD electronic archive server