736 research outputs found
Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Analyzing memes on the internet has emerged as a crucial endeavor due to the
impact this multi-modal form of content wields in shaping online discourse.
Memes have become a powerful tool for expressing emotions and sentiments,
possibly even spreading hate and misinformation, through humor and sarcasm. In
this paper, we present the overview of the Memotion 3 shared task, as part of
the DeFactify 2 workshop at AAAI-23. The task released an annotated dataset of
Hindi-English code-mixed memes based on their Sentiment (Task A), Emotion (Task
B), and Emotion intensity (Task C). Each of these is defined as an individual
task and the participants are ranked separately for each task. Over 50 teams
registered for the shared task and 5 made final submissions to the test set of
the Memotion 3 dataset. CLIP, BERT modifications, ViT etc. were the most
popular models among the participants along with approaches such as
Student-Teacher model, Fusion, and Ensembling. The best final F1 score for Task
A is 34.41, Task B is 79.77 and Task C is 59.82.Comment: Defactify2 @AAAI 202
Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter
Aggression Identification and Hate Speech detection had become an essential part of
cyberharassment and cyberbullying and an automatic aggression identification can lead to the
interception of such trolling. Following the same idealization, vista.ue team participated in the
workshop which included a shared task on ’Aggression Identification’.
A dataset of 15,000 aggression-annotated Facebook Posts and Comments written in Hindi (in
both Roman and Devanagari script) and English languages were made available and different
classification models were designed. This paper presents a model that outperforms Facebook
FastText (Joulin et al., 2016a) and deep learning models over this dataset. Especially, the English
developed system, when used to classify Twitter text, outperforms all the shared task submitted
systems
Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021
With the growth of social media platform influence, the effect of their
misuse becomes more and more impactful. The importance of automatic detection
of threatening and abusive language can not be overestimated. However, most of
the existing studies and state-of-the-art methods focus on English as the
target language, with limited work on low- and medium-resource languages. In
this paper, we present two shared tasks of abusive and threatening language
detection for the Urdu language which has more than 170 million speakers
worldwide. Both are posed as binary classification tasks where participating
systems are required to classify tweets in Urdu into two classes, namely: (i)
Abusive and Non-Abusive for the first task, and (ii) Threatening and
Non-Threatening for the second. We present two manually annotated datasets
containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening
and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the
train part and 1100 annotated tweets in the test part. The threatening dataset
contains 6000 annotated tweets in the train part and 3950 annotated tweets in
the test part. We also provide logistic regression and BERT-based baseline
classifiers for both tasks. In this shared task, 21 teams from six countries
registered for participation (India, Pakistan, China, Malaysia, United Arab
Emirates, and Taiwan), 10 teams submitted their runs for Subtask A, which is
Abusive Language Detection and 9 teams submitted their runs for Subtask B,
which is Threatening Language detection, and seven teams submitted their
technical reports. The best performing system achieved an F1-score value of
0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based
transformer model showed the best performance
A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts
Wide usage of social media platforms has increased the risk of aggression,
which results in mental stress and affects the lives of people negatively like
psychological agony, fighting behavior, and disrespect to others. Majority of
such conversations contains code-mixed languages[28]. Additionally, the way
used to express thought or communication style also changes from one social
media plat-form to another platform (e.g., communication styles are different
in twitter and Facebook). These all have increased the complexity of the
problem. To solve these problems, we have introduced a unified and robust
multi-modal deep learning architecture which works for English code-mixed
dataset and uni-lingual English dataset both.The devised system, uses
psycho-linguistic features and very ba-sic linguistic features. Our multi-modal
deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and
Disconnected RNN(with Glove and FastText embedding, both). Finally, the system
takes the decision based on model averaging. We evaluated our system on English
Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from
Kaggle. Experimental results show that our proposed system outperforms all the
previous approaches on English code-mixed dataset and uni-lingual English
dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
The growing prevalence and rapid evolution of offensive language in social
media amplify the complexities of detection, particularly highlighting the
challenges in identifying such content across diverse languages. This survey
presents a systematic and comprehensive exploration of Cross-Lingual Transfer
Learning (CLTL) techniques in offensive language detection in social media. Our
study stands as the first holistic overview to focus exclusively on the
cross-lingual scenario in this domain. We analyse 67 relevant papers and
categorise these studies across various dimensions, including the
characteristics of multilingual datasets used, the cross-lingual resources
employed, and the specific CLTL strategies implemented. According to "what to
transfer", we also summarise three main CLTL transfer approaches: instance,
feature, and parameter transfer. Additionally, we shed light on the current
challenges and future research opportunities in this field. Furthermore, we
have made our survey resources available online, including two comprehensive
tables that provide accessible references to the multilingual datasets and CLTL
methods used in the reviewed literature.Comment: 35 pages, 7 figure
- …