Search CORE

736 research outputs found

Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes

Author: Chadha Aman
Chakraborty Megha
Chinnakotla Manoj
Das Amitava
Ekbal Asif
Kumar Srijan
Mishra Shreyash
Patwa Parth
Rani Anku
Reganti Aishwarya
Sheth Amit
Suryavardan S
Publication venue
Publication date: 12/09/2023
Field of study

Analyzing memes on the internet has emerged as a crucial endeavor due to the impact this multi-modal form of content wields in shaping online discourse. Memes have become a powerful tool for expressing emotions and sentiments, possibly even spreading hate and misinformation, through humor and sarcasm. In this paper, we present the overview of the Memotion 3 shared task, as part of the DeFactify 2 workshop at AAAI-23. The task released an annotated dataset of Hindi-English code-mixed memes based on their Sentiment (Task A), Emotion (Task B), and Emotion intensity (Task C). Each of these is defined as an individual task and the participants are ranked separately for each task. Over 50 teams registered for the shared task and 5 made final submissions to the test set of the Memotion 3 dataset. CLIP, BERT modifications, ViT etc. were the most popular models among the participants along with approaches such as Student-Teacher model, Fusion, and Ensembling. The best final F1 score for Task A is 34.41, Task B is 79.77 and Task C is 59.82.Comment: Defactify2 @AAAI 202

arXiv.org e-Print Archive

Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter

Author: Gonçalves Teresa
Nogueira Vítor
Quaresma Paulo
Raiyani Kashyap
Publication venue: TRAC-2018 - ACL
Publication date: 01/01/2018
Field of study

Aggression Identification and Hate Speech detection had become an essential part of cyberharassment and cyberbullying and an automatic aggression identification can lead to the interception of such trolling. Following the same idealization, vista.ue team participated in the workshop which included a shared task on ’Aggression Identification’. A dataset of 15,000 aggression-annotated Facebook Posts and Comments written in Hindi (in both Roman and Devanagari script) and English languages were made available and different classification models were designed. This paper presents a model that outperforms Facebook FastText (Joulin et al., 2016a) and deep learning models over this dataset. Especially, the English developed system, when used to classify Twitter text, outperforms all the shared task submitted systems

Repositório Científico da Universidade de Évora

The Role of Computational Stylometry in Identifying (Misogynistic) Aggression in English Social Media Texts

Author: Manna Raffaele
Monti Johanna
Pascucci Antonio
Vincenzo Masucci
Publication venue
Publication date: 01/01/2020
Field of study

ARCHIVIO ISTITUZIONALE DELLA RICERCA-UNIVERSITA' DEGLI STUDI DI NAPOLI "L'ORIENTALE"

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

Author: Amjad Hamza Imam
Amjad Maaz
Butta Sabur
Gelbukh Alexander
Labunets Andrey
Sidorov Grigori
Vitman Oxana
Zhila Alisa
Publication venue
Publication date: 14/07/2022
Field of study

With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language which has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, and (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, and Taiwan), 10 teams submitted their runs for Subtask A, which is Abusive Language Detection and 9 teams submitted their runs for Subtask B, which is Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance

arXiv.org e-Print Archive

A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

Author: Aroyehun Segun Taofeek
Arroyo-Fernández Ignacio
Fortuna Paula
François Chollet
Galery Thiago
Golem Viktor
Hutto Clayton J
Kingma Diederik P
Kumar Ritesh
Kumar Ritesh
Kumar Ritesh
Kumar Ritesh
Madisetty Sreekanth
Majumder Prasenjit
Mikolov Tomas
Orabi Ahmed Husseini
Orasan Constantin
Pawlikowski Maciej
Ramiandrisoa Faneva
Risch Julian
Tieleman Tijmen
Tommasel Antonela
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/01/2020
Field of study

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202

arXiv.org e-Print Archive

Crossref

Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

Author: Jiang Aiqi
Zubiaga Arkaitz
Publication venue
Publication date: 17/01/2024
Field of study

The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to "what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature.Comment: 35 pages, 7 figure

arXiv.org e-Print Archive

Identification of Multilingual Offense and Troll from Social Media Memes Using Weighted Ensemble of Multimodal Features

Author: Akber Dewan M. Ali
Hoque Mohammed Moshiul
Hossain Eftekhar
Hossain Md. Azad
Sharif Omar
Siddique Nazmul
Publication venue: 'Elsevier BV'
Publication date: 23/06/2022
Field of study

Ulster University's Research Portal