Search CORE

42 research outputs found

Identification of Multilingual Offense and Troll from Social Media Memes Using Weighted Ensemble of Multimodal Features

Author: Akber Dewan M. Ali
Hoque Mohammed Moshiul
Hossain Eftekhar
Hossain Md. Azad
Sharif Omar
Siddique Nazmul
Publication venue: 'Elsevier BV'
Publication date: 23/06/2022
Field of study

An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India

Author: Ranasinghe Tharindu
Zampieri Marcos
Publication venue: 'MDPI AG'
Publication date: 01/07/2021
Field of study

The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been published in the last few years, with promising results. The majority of these studies, however, deal with high-resource languages such as English due to the availability of datasets in these languages. Recent work has addressed offensive language identification from a low-resource perspective, exploring data augmentation strategies and trying to take advantage of existing multilingual pretrained models to cope with data scarcity in low-resource scenarios. In this work, we revisit the problem of low-resource offensive language identification by evaluating the performance of multilingual transformers in offensive language identification for languages spoken in India. We investigate languages from different families such as Indo-Aryan (e.g., Bengali, Hindi, and Urdu) and Dravidian (e.g., Tamil, Malayalam, and Kannada), creating important new technology for these languages. The results show that multilingual offensive language identification models perform better than monolingual models and that cross-lingual transformers show strong zero-shot and few-shot performance across languages

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Aston Publications Explorer

Identification of Online Harassment Using Ensemble Fine-Tuned Pre-Trained Bert

Author: Dadvandipour Samad
Ganie Aadil Gani
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/01/2022
Field of study

Repository of the Academy's Library

LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification

Author: Bayram Duygu
Chernyshev Konstantin
Edman Lukas
Garanina Ekaterina
Zheng Qiankun
Publication venue
Publication date: 08/06/2023
Field of study

Misogyny and sexism are growing problems in social media. Advances have been made in online sexism detection but the systems are often uninterpretable. SemEval-2023 Task 10 on Explainable Detection of Online Sexism aims at increasing explainability of the sexism detection, and our team participated in all the proposed subtasks. Our system is based on further domain-adaptive pre-training (Gururangan et al., 2020). Building on the Transformer-based models with the domain adaptation, we compare fine-tuning with multi-task learning and show that each subtask requires a different system configuration. In our experiments, multi-task learning performs on par with standard fine-tuning for sexism detection and noticeably better for coarse-grained sexism classification, while fine-tuning is preferable for fine-grained classification

arXiv.org e-Print Archive

An Empirical Study of Offensive Language in Online Interactions

Author: Sarkar Diptanu
Publication venue: RIT Scholar Works
Publication date: 01/05/2021
Field of study

In the past decade, usage of social media platforms has increased significantly. People use these platforms to connect with friends and family, share information, news and opinions. Platforms such as Facebook, Twitter are often used to propagate offensive and hateful content online. The open nature and anonymity of the internet fuels aggressive and inflamed conversations. The companies and federal institutions are striving to make social media cleaner, welcoming and unbiased. In this study, we first explore the underlying topics in popular offensive language datasets using statistical and neural topic modeling. The current state-of-the-art models for aggression detection only present a toxicity score based on the entire post. Content moderators often have to deal with lengthy texts without any word-level indicators. We propose a neural transformer approach for detecting the tokens that make a particular post aggressive. The pre-trained BERT model has achieved state-of-the-art results in various natural language processing tasks. However, the model is trained on general-purpose corpora and lacks aggressive social media linguistic features. We propose fBERT, a retrained BERT model with over

1.4

million offensive tweets from the SOLID dataset. We demonstrate the effectiveness and portability of fBERT over BERT in various shared offensive language detection tasks. We further propose a new multi-task aggression detection (MAD) framework for post and token-level aggression detection using neural transformers. The experiments confirm the effectiveness of the multi-task learning model over individual models; particularly when the number of training data is limited

RIT Scholar Works

Towards multidomain and multilingual abusive language detection: a survey

Author: Basile V.
Pamungkas E. W.
Patti V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

Author: Atanasova Pepa
Coltekin Cagri
Derczynski Leon
Karadzhov Georgi
Mubarak Hamdy
Nakov Preslav
Pitenis Zeses
Rosenthal Sara
Zampieri Marcos
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 17/07/2020
Field of study

We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.Comment: Proceedings of the International Workshop on Semantic Evaluation (SemEval-2020

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY