1,286 research outputs found
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval
TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection
Fake news detection aims to detect fake news widely spreading on social media
platforms, which can negatively influence the public and the government. Many
approaches have been developed to exploit relevant information from news
images, text, or videos. However, these methods may suffer from the following
limitations: (1) ignore the inherent emotional information of the news, which
could be beneficial since it contains the subjective intentions of the authors;
(2) pay little attention to the relation (similarity) between the title and
textual information in news articles, which often use irrelevant title to
attract reader' attention. To this end, we propose a novel Title-Text
similarity and emotion-aware Fake news detection (TieFake) method by jointly
modeling the multi-modal context information and the author sentiment in a
unified framework. Specifically, we respectively employ BERT and ResNeSt to
learn the representations for text and images, and utilize publisher emotion
extractor to capture the author's subjective emotion in the news content. We
also propose a scale-dot product attention mechanism to capture the similarity
between title features and textual features. Experiments are conducted on two
publicly available multi-modal datasets, and the results demonstrate that our
proposed method can significantly improve the performance of fake news
detection. Our code is available at https://github.com/UESTC-GQJ/TieFake.Comment: Appear on IJCNN 202
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Sentiment analysis has been emerging recently as one of the major natural
language processing (NLP) tasks in many applications. Especially, as social
media channels (e.g. social networks or forums) have become significant sources
for brands to observe user opinions about their products, this task is thus
increasingly crucial. However, when applied with real data obtained from social
media, we notice that there is a high volume of short and informal messages
posted by users on those channels. This kind of data makes the existing works
suffer from many difficulties to handle, especially ones using deep learning
approaches. In this paper, we propose an approach to handle this problem. This
work is extended from our previous work, in which we proposed to combine the
typical deep learning technique of Convolutional Neural Networks with domain
knowledge. The combination is used for acquiring additional training data
augmentation and a more reasonable loss function. In this work, we further
improve our architecture by various substantial enhancements, including
negation-based data augmentation, transfer learning for word embeddings, the
combination of word-level embeddings and character-level embeddings, and using
multitask learning technique for attaching domain knowledge rules in the
learning process. Those enhancements, specifically aiming to handle short and
informal messages, help us to enjoy significant improvement in performance once
experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in
IJCVR on September 201
- …