306 research outputs found
Recommended from our members
Sarcasm detection on Twitter
State-of-the-art approaches for sarcasm detection in social media combine lexical clues with contextual information surrounding the potentially sarcastic posting including author information. This article presents detailed methods for performing contextualizing sarcasm detection on Twitter, including data extraction, feature engineering and classification model settings. I reproduce the state-of-the-art results reported by Bamman and Smith (2015).Informatio
Researchers eye-view of sarcasm detection in social media textual content
The enormous use of sarcastic text in all forms of communication in social
media will have a physiological effect on target users. Each user has a
different approach to misusing and recognising sarcasm. Sarcasm detection is
difficult even for users, and this will depend on many things such as
perspective, context, special symbols. So, that will be a challenging task for
machines to differentiate sarcastic sentences from non-sarcastic sentences.
There are no exact rules based on which model will accurately detect sarcasm
from many text corpus in the current situation. So, one needs to focus on
optimistic and forthcoming approaches in the sarcasm detection domain. This
paper discusses various sarcasm detection techniques and concludes with some
approaches, related datasets with optimal features, and the researcher's
challenges.Comment: 8 page
Computational Sarcasm Analysis on Social Media: A Systematic Review
Sarcasm can be defined as saying or writing the opposite of what one truly
wants to express, usually to insult, irritate, or amuse someone. Because of the
obscure nature of sarcasm in textual data, detecting it is difficult and of
great interest to the sentiment analysis research community. Though the
research in sarcasm detection spans more than a decade, some significant
advancements have been made recently, including employing unsupervised
pre-trained transformers in multimodal environments and integrating context to
identify sarcasm. In this study, we aim to provide a brief overview of recent
advancements and trends in computational sarcasm research for the English
language. We describe relevant datasets, methodologies, trends, issues,
challenges, and tasks relating to sarcasm that are beyond detection. Our study
provides well-summarized tables of sarcasm datasets, sarcastic features and
their extraction methods, and performance analysis of various approaches which
can help researchers in related domains understand current state-of-the-art
practices in sarcasm detection.Comment: 50 pages, 3 tables, Submitted to 'Data Mining and Knowledge
Discovery' for possible publicatio
"Did you really mean what you said?" : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings
With the increased use of social media platforms by people across the world,
many new interesting NLP problems have come into existence. One such being the
detection of sarcasm in the social media texts. We present a corpus of tweets
for training custom word embeddings and a Hinglish dataset labelled for sarcasm
detection. We propose a deep learning based approach to address the issue of
sarcasm detection in Hindi-English code mixed tweets using bilingual word
embeddings derived from FastText and Word2Vec approaches. We experimented with
various deep learning models, including CNNs, LSTMs, Bi-directional LSTMs (with
and without attention). We were able to outperform all state-of-the-art
performances with our deep learning models, with attention based Bi-directional
LSTMs giving the best performance exhibiting an accuracy of 78.49%
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
NLP tasks are often limited by scarcity of manually annotated data. In social
media sentiment analysis and related tasks, researchers have therefore used
binarized emoticons and specific hashtags as forms of distant supervision. Our
paper shows that by extending the distant supervision to a more diverse set of
noisy labels, the models can learn richer representations. Through emoji
prediction on a dataset of 1246 million tweets containing one of 64 common
emojis we obtain state-of-the-art performance on 8 benchmark datasets within
sentiment, emotion and sarcasm detection using a single pretrained model. Our
analyses confirm that the diversity of our emotional labels yield a performance
improvement over previous distant supervision approaches.Comment: Accepted at EMNLP 2017. Please include EMNLP in any citations. Minor
changes from the EMNLP camera-ready version. 9 pages + references and
supplementary materia
A novel Auto-ML Framework for Sarcasm Detection
Many domains have sarcasm or verbal irony presented in the text of reviews, tweets, comments, and dialog discussions. The purpose of this research is to classify sarcasm for multiple domains using the deep learning based AutoML framework. The proposed AutoML framework has five models in the model search pipeline, these five models are the combination of convolutional neural network (CNN), Long Short-Term Memory (LSTM), deep neural network (DNN), and Bidirectional Long Short-Term Memory (BiLSTM). The hybrid combination of CNN, LSTM, and DNN models are presented as CNN-LSTM-DNN, LSTM-DNN, BiLSTM-DNN, and CNN-BiLSTM-DNN. This work has proposed the algorithms that contrast polarities between terms and phrases, which are categorized into implicit and explicit incongruity categories. The incongruity and pragmatic features like punctuation, exclamation marks, and others integrated into the AutoML DeepConcat framework models. That integration was possible when the DeepConcat AutoML framework initiate a model search pipeline for five models to achieve better performance. Conceptually, DeepConcat means that model will integrate with generalized features. It was evident that the pretrain model BiLSTM achieved a better performance of 0.98 F1 when compared with the other five model performances. Similarly, the AutoML based BiLSTM-DNN model achieved the best performance of 0.98 F1, which is better than core approaches and existing state-of-the-art Tweeter tweet dataset, Amazon reviews, and dialog discussion comments. The proposed AutoML framework has compared performance metrics F1 and AUC and discovered that F1 is better than AUC. The integration of all feature categories achieved a better performance than the individual category of pragmatic and incongruity features. This research also evaluated the performance of the dropout layer hyperparameter and it achieved better performance than the fixed percentage like 10% of dropout parameter of the AutoML based Bayesian optimization. Proposed AutoML framework DeepConcat evaluated best pretrain models BiLSTM-DNN and CNN-CNN-DNN to transfer knowledge across domains like Amazon reviews and Dialog discussion comments (text) using the last strategy, full layer, and our fade-out freezing strategies. In the transfer learning fade-out strategy outperformed the existing state-of-the-art model BiLSTM-DNN, the performance is 0.98 F1 on tweets, 0.85 F1 on Amazon reviews, and 0.87 F1 on the dialog discussion SCV2-Gen dataset. Further, all strategies with various domains can be compared for the best model selection
- …