Search CORE

877 research outputs found

FBK-DH at SemEval-2020 Task 12: Using Multi-channel BERT for Multilingual Offensive Language Detection

Author: Alessio Palmero Aprosio
Camilla Casula
Sara Tonelli
Stefano Menini
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

In this paper we present our submission to sub-task A at SemEval 2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval2). For Danish, Turkish, Arabic and Greek, we develop an architecture based on transfer learning and relying on a two-channel BERT model, in which the English BERT and the multilingual one are combined after creating a machine-translated parallel corpus for each language in the task. For English, instead, we adopt a more standard, single-channel approach. We find that, in a multilingual scenario, with some languages having small training data, using parallel BERT models with machine translated data can give systems more stability, especially when dealing with noisy data. The fact that machine translation on social media data may not be perfect does not hurt the overall classification performance

Archivio della ricerca - Fondazione Bruno Kessler

Open Access Repository

Overview of the EVALITA 2018 Hate Speech Detection Task

Author: Bosco Cristina
Felice Dell&apos
Maurizio Tesconi
Poletto Fabio
Sanguinetti Manuela
Publication venue: CEUR
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Annotating hate speech: Three schemes at comparison

Author: Basile V.
Bosco C.
Patti V.
Poletto Fabio
Stranisci M.
Publication venue: CEUR-WS
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

Multilayer Perceptron and TF-IDF in the Classification of Hate Speech on Twitter in Indonesian

Author: Latipah Asslia Johar
Syahrandi Akmal
Verdikha Naufal Azmi
Publication venue: LPPI Universitas Muhammadiyah Kalimantan Timur (UMKT)
Publication date: 27/09/2023
Field of study

Twitter nowadays is one of the popular social media which currently has over 300millions accounts, twitter is the rich source to learn about people’s opion and sentimental analysis. However, this also brings new problems where the practice of hate speech. This research classifies of hate speech on social media. Evaluation using dataset from previous research Ibrohim&Budi (2019), then using classification method Multilayer Perceptron which combined with feature extraction to be able to detect negations and weighting uses Term Frequency – Inverse Document Frequency (TF-IDF). Results show that the F1 score gives an accuracy rate of up to 74.51%. This research has a reasonably good effectiveness from combining the TF-IDF and Multilayer Perceptron methods, considering the results obtained from the F1 Score evaluation value

Universitas Muhammadiyah Kalimantan Timur (UMKT) Online Journals

Merging datasets for emotion analysis

Author: de Arriba Serra Ariadna
Franch Gutiérrez Javier
Oriol Hilari Marc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Context. Applying sentiment analysis is in general a laborious task. Furthermore, if we add the task of getting a good quality dataset with balanced distribution and enough samples, the job becomes more complicated. Objective. We want to find out whether merging compatible datasets improves emotion analysis based on machine learning (ML) techniques, compared to the original, individual datasets. Method. We obtained two datasets with Covid-19-related tweets written in Spanish, and then built from them two new datasets combining the original ones with different consolidation of balance. We analyzed the results according to precision, recall, F1-score and accuracy. Results. The results obtained show that merging two datasets can improve the performance of ML models, particularly the F1-score, when the merging process follows a strategy that optimizes the balance of the resulting dataset. Conclusions. Merging two datasets can improve the performance of ML models for emotion analysis, whilst saving resources for labeling training data. This might be especially useful for several software engineering activities that leverage on ML-based emotion analysis techniques.This paper has been funded by the Spanish Ministerio de Ciencia e Innovación under project / funding scheme PID2020-117191RB.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Towards multidomain and multilingual abusive language detection: a survey

Author: Basile V.
Pamungkas E. W.
Patti V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Strategies to exploit XAI to improve classification systems

Author: Apicella Andrea
Di Lorenzo Luca
Isgrò Francesco
Pollastro Andrea
Prevete Roberto
Publication venue
Publication date: 09/06/2023
Field of study

Explainable Artificial Intelligence (XAI) aims to provide insights into the decision-making process of AI models, allowing users to understand their results beyond their decisions. A significant goal of XAI is to improve the performance of AI models by providing explanations for their decision-making processes. However, most XAI literature focuses on how to explain an AI system, while less attention has been given to how XAI methods can be exploited to improve an AI system. In this work, a set of well-known XAI methods typically used with Machine Learning (ML) classification tasks are investigated to verify if they can be exploited, not just to provide explanations but also to improve the performance of the model itself. To this aim, two strategies to use the explanation to improve a classification system are reported and empirically evaluated on three datasets: Fashion-MNIST, CIFAR10, and STL10. Results suggest that explanations built by Integrated Gradients highlight input features that can be effectively used to improve classification performance.Comment: This work has been accepted to be presented to The 1st World Conference on eXplainable Artificial Intelligence (xAI 2023), July 26-28, 2023 - Lisboa, Portuga

arXiv.org e-Print Archive

DH-FBK @ HaSpeeDe2: Italian Hate Speech Detection via Self-Training and Oversampling

Author: Leonardelli Elisa
Menini Stefano
Tonelli Sara
Publication venue: 'OpenEdition'
Publication date: 11/03/2021
Field of study

We describe in this paper the system submitted by the DH-FBK team to the HaSpeeDe evaluation task, and dealing with Italian hate speech detection (Task A). While we adopt a standard approach for fine-tuning AlBERTo, the Italian BERT model trained on tweets, we propose to improve the final classification performance by two additional steps, i.e. self-training and oversampling. Indeed, we extend the initial training data with additional silver data, carefully sampled from domain-specific tweets and obtained after first training our system only with the task training data. Then, we re-train the classifier by merging silver and task training data but oversampling the latter, so that the obtained model is more robust to possible inconsistencies in the silver data. With this configuration, we obtain a macro-averaged F1 of 0.753 on tweets, and 0.702 on news headlines

Archivio della ricerca - Fondazione Bruno Kessler

OpenEdition

Detecting Abusive Language on Online Platforms: A Critical Analysis

Author: Augenstein Isabelle
Bhatawdekar Ameya
Bouchard Guillaume
Dent Kyle
Dinkov Yoan
Hardalov Momchil
Nakov Preslav
Nayak Vibha
Sarwar Sheikh Muhammad
Zlatkova Dimitrina
Publication venue
Publication date: 27/02/2021
Field of study

Abusive language on online platforms is a major societal problem, often leading to important societal problems such as the marginalisation of underrepresented minorities. There are many different forms of abusive language such as hate speech, profanity, and cyber-bullying, and online platforms seek to moderate it in order to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users. Within the field of Natural Language Processing, researchers have developed different methods for automatically detecting abusive language, often focusing on specific subproblems or on narrow communities, as what is considered abusive language very much differs by context. We argue that there is currently a dichotomy between what types of abusive language online platforms seek to curb, and what research efforts there are to automatically detect abusive language. We thus survey existing methods as well as content moderation policies by online platforms in this light, and we suggest directions for future work

arXiv.org e-Print Archive