134 research outputs found
"Did you really mean what you said?" : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings
With the increased use of social media platforms by people across the world,
many new interesting NLP problems have come into existence. One such being the
detection of sarcasm in the social media texts. We present a corpus of tweets
for training custom word embeddings and a Hinglish dataset labelled for sarcasm
detection. We propose a deep learning based approach to address the issue of
sarcasm detection in Hindi-English code mixed tweets using bilingual word
embeddings derived from FastText and Word2Vec approaches. We experimented with
various deep learning models, including CNNs, LSTMs, Bi-directional LSTMs (with
and without attention). We were able to outperform all state-of-the-art
performances with our deep learning models, with attention based Bi-directional
LSTMs giving the best performance exhibiting an accuracy of 78.49%
A Survey of Sentiment Analysis and Sarcasm Detection: Challenges, Techniques, and Trends
In recent years, more people have been using the internet and social media to express their opinions on various subjects, such as institutions, services, or specific ideas. This increase highlights the importance of developing automated tools for accurate sentiment analysis. Moreover, addressing sarcasm in text is crucial, as it can significantly impact the efficacy of sentiment analysis models. This paper aims to provide a comprehensive overview of the conducted research on sentiment analysis and sarcasm detection, focusing on the time from 2018 to 2023. It explores the challenges faced and the methods used to address them. It conducts a comparison of these methods. It also aims to identify emerging trends that will likely influence the future of sentiment analysis and sarcasm detection, ensuring their continued effectiveness. This paper enhances the existing knowledge by offering a comprehensive analysis of 40 research works, evaluating performance, addressing multilingual challenges, and highlighting future trends in sarcasm detection and sentiment analysis. It is a valuable resource for researchers and experts interested in the field, facilitating further advancements in sentiment analysis techniques and applications. It categorizes sentiment analysis methods into ML, lexical, and hybrid approaches, highlighting deep learning, especially Recurrent Neural Networks (RNNs), for effective textual classification with labeled or unlabeled data
A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends
Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy
QutNocturnal@HASOC'19: CNN for Hate Speech and Offensive Content Identification in Hindi Language
We describe our top-team solution to Task 1 for Hindi in the HASOC contest
organised by FIRE 2019. The task is to identify hate speech and offensive
language in Hindi. More specifically, it is a binary classification problem
where a system is required to classify tweets into two classes: (a) \emph{Hate
and Offensive (HOF)} and (b) \emph{Not Hate or Offensive (NOT)}. In contrast to
the popular idea of pretraining word vectors (a.k.a. word embedding) with a
large corpus from a general domain such as Wikipedia, we used a relatively
small collection of relevant tweets (i.e. random and sarcasm tweets in Hindi
and Hinglish) for pretraining. We trained a Convolutional Neural Network (CNN)
on top of the pretrained word vectors. This approach allowed us to be ranked
first for this task out of all teams. Our approach could easily be adapted to
other applications where the goal is to predict class of a text when the
provided context is limited
- ā¦