526 research outputs found
Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection
Speaker change detection (SCD) is an important task in dialog modeling. Our
paper addresses the problem of text-based SCD, which differs from existing
audio-based studies and is useful in various scenarios, for example, processing
dialog transcripts where speaker identities are missing (e.g., OpenSubtitle),
and enhancing audio SCD with textual information. We formulate text-based SCD
as a matching problem of utterances before and after a certain decision point;
we propose a hierarchical recurrent neural network (RNN) with static
sentence-level attention. Experimental results show that neural networks
consistently achieve better performance than feature-based approaches, and that
our attention-based model significantly outperforms non-attention neural
networks.Comment: In Proceedings of the ACM on Conference on Information and Knowledge
Management (CIKM), 201
Conversational Analysis using Utterance-level Attention-based Bidirectional Recurrent Neural Networks
Recent approaches for dialogue act recognition have shown that context from
preceding utterances is important to classify the subsequent one. It was shown
that the performance improves rapidly when the context is taken into account.
We propose an utterance-level attention-based bidirectional recurrent neural
network (Utt-Att-BiRNN) model to analyze the importance of preceding utterances
to classify the current one. In our setup, the BiRNN is given the input set of
current and preceding utterances. Our model outperforms previous models that
use only preceding utterances as context on the used corpus. Another
contribution of the article is to discover the amount of information in each
utterance to classify the subsequent one and to show that context-based
learning not only improves the performance but also achieves higher confidence
in the classification. We use character- and word-level features to represent
the utterances. The results are presented for character and word feature
representations and as an ensemble model of both representations. We found that
when classifying short utterances, the closest preceding utterances contributes
to a higher degree.Comment: Proceedings of INTERSPEECH 201
Multimodal Short Video Rumor Detection System Based on Contrastive Learning
With short video platforms becoming one of the important channels for news
sharing, major short video platforms in China have gradually become new
breeding grounds for fake news. However, it is not easy to distinguish short
video rumors due to the great amount of information and features contained in
short videos, as well as the serious homogenization and similarity of features
among videos. In order to mitigate the spread of short video rumors, our group
decides to detect short video rumors by constructing multimodal feature fusion
and introducing external knowledge after considering the advantages and
disadvantages of each algorithm. The ideas of detection are as follows: (1)
dataset creation: to build a short video dataset with multiple features; (2)
multimodal rumor detection model: firstly, we use TSN (Temporal Segment
Networks) video coding model to extract video features; then, we use OCR
(Optical Character Recognition) and ASR (Automatic Character Recognition) to
extract video features. Recognition) and ASR (Automatic Speech Recognition)
fusion to extract text, and then use the BERT model to fuse text features with
video features (3) Finally, use contrast learning to achieve distinction: first
crawl external knowledge, then use the vector database to achieve the
introduction of external knowledge and the final structure of the
classification output. Our research process is always oriented to practical
needs, and the related knowledge results will play an important role in many
practical scenarios such as short video rumor identification and social opinion
control
- …