Search CORE

5,341 research outputs found

Cross-Platform Evaluation for Italian Hate Speech Detection

Author: Elena Cabrio
Michele Corazza
Sara Tonelli
Serena Villata
Stefano Menini
Publication venue
Publication date: 01/01/2019
Field of study

Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks, the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and WhatsApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration

Archivio della ricerca - Fondazione Bruno Kessler

Cross-Platform Evaluation for Italian Hate Speech Detection

Author: Cabrio Elena
Corazza Michele
Menini Stefano
Tonelli Sara
Villata Serena
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

International audienceEnglish. Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks , the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and What-sApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration. 1 Italiano. Nonostante si osservi un cre-scente interesse per approcci che identi-fichino il linguaggio offensivo sui social network attraverso l'NLP, la necessità di sviluppare sistemi che mantengano una buona performance anche su piattaforme diverseè ancora un tema di ricerca aper-to. In questo contributo presentiamo una valutazione comparativa su dataset per l'identificazione di linguaggio d'odio pro-venienti da quattro diverse piattaforme: Facebook, Twitter, Instagram and Wha-tsApp. Lo studio dimostra che, combinan-do dataset diversi per aumentare i dati di training, migliora le performance di clas-sificazione, anche se l'impatto varia a se-conda della piattaforma considerata.

Archivio della ricerca - Fondazione Bruno Kessler

INRIA a CCSD electronic archive server

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Time of your hate: The challenge of time in hate speech detection on social media

Author: Basile P.
Basile V.
Florio K.
Patti V.
Polignano M.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users' opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the "Contro l'odio" platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier

Archivio istituzionale della ricerca - Università di Bari

Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter

Author: Frenda Simona
Ghanem Bilal
Montes-y-Gómez Manuel
Rosso Paolo
Publication venue: 'IOS Press'
Publication date: 01/01/2019
Field of study

[EN] Patriarchal behavior, such as other social habits, has been transferred online, appearing as misogynistic and sexist comments, posts or tweets. This online hate speech against women has serious consequences in real life, and recently, various legal cases have arisen against social platforms that scarcely block the spread of hate messages towards individuals. In this difficult context, this paper presents an approach that is able to detect the two sides of patriarchal behavior, misogyny and sexism, analyzing three collections of English tweets, and obtaining promising results.The work of Simona Frenda and Paolo Rosso was partially funded by the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P). We also thank the support of CONACYT-Mexico (project FC-2410).Frenda, S.; Ghanem, B.; Montes-Y-Gómez, M.; Rosso, P. (2019). Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter. Journal of Intelligent & Fuzzy Systems. 36(5):4743-4752. https://doi.org/10.3233/JIFS-179023S47434752365Anzovino M. , Fersini E. and Rosso P. , Automatic Identification and Classification of Misogynistic Language on Twitter, Proc 23rd International Conference on Applications of Natural Language to Information Systems, NLDB-2018, Springer-Verlag, LNCS 10859, 2018, pp. 57–64.Burnap P. and Williams M.L. , Hate speech, machine classification and statistical modelling of information flows on Twitter: Interpretation and communication for policy decision making, Internet, Policy and Politics, Oxford, UK, 2014.Burnap, P., Rana, O. F., Avis, N., Williams, M., Housley, W., Edwards, A., … Sloan, L. (2015). Detecting tension in online communities with computational Twitter analysis. Technological Forecasting and Social Change, 95, 96-108. doi:10.1016/j.techfore.2013.04.013Chen Y. , Zhou Y. , Zhu S. and Xu H. , Detecting offensive language in social media to protect adolescent online safety, Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom), Amsterdam, Netherlands, IEEE, 2012, pp. 71–80.Escalante, H. J., Villatoro-Tello, E., Garza, S. E., López-Monroy, A. P., Montes-y-Gómez, M., & Villaseñor-Pineda, L. (2017). Early detection of deception and aggressiveness using profile-based representations. Expert Systems with Applications, 89, 99-111. doi:10.1016/j.eswa.2017.07.040Fersini E. , Anzovino M. and Rosso P. , Overview of the Task on Automatic Misogyny Identification at IBEREVAL, CEUR Workshop Proceedings 2150, Seville, Spain, 2018.Fersini E. , Nozza D. and Rosso P. , Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI), Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18), Turin, Italy, 2018.Fox, J., & Tang, W. Y. (2014). Sexism in online video games: The role of conformity to masculine norms and social dominance orientation. Computers in Human Behavior, 33, 314-320. doi:10.1016/j.chb.2013.07.014Fulper R. , Ciampaglia G.L. , Ferrara E. , Ahn Y. , Flammini A. , Menczer F. , Lewis B. and Rowe K. , Misogynistic language on Twitter and sexual violence, Proceedings of the ACM Web Science Workshop on Computational Approaches to Social Modeling (ChASM), 2014.Gambäck B. and Sikdar U.K. , Using convolutional neural networks to classify hate-speech, Proceedings of the First Workshop on Abusive Language Online 2017.Hewitt, S., Tiropanis, T., & Bokhove, C. (2016). The problem of identifying misogynist language on Twitter (and other online social spaces). Proceedings of the 8th ACM Conference on Web Science. doi:10.1145/2908131.2908183Justo R. , Corcoran T. , Lukin S.M. , Walker M. and Torres M.I. , Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web, Knowledge-Based Systems, 2014.Lapidot-Lefler, N., & Barak, A. (2012). Effects of anonymity, invisibility, and lack of eye-contact on toxic online disinhibition. Computers in Human Behavior, 28(2), 434-443. doi:10.1016/j.chb.2011.10.014Nobata C. , Tetreault J. , Thomas A. , Mehdad Y. and Chang Y. , Abusive language detection in online user content, Proceedings of the 25th International Conference on World Wide Web, Geneva, Switzerland, 2016, pp. 145–153.Poland, B. (2016). Haters. doi:10.2307/j.ctt1fq9wdpSamghabadi N.S. , Maharjan S. , Sprague A. , Diaz-Sprague R. and Solorio T. , Detecting nastiness in social media, Proceedings of the First Workshop on Abusive Language Online, Vancouver, Canada, 2017, pp. 63–72. Association for Computational Linguistics.Sood, S., Antin, J., & Churchill, E. (2012). Profanity use in online communities. Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI ’12. doi:10.1145/2207676.220861

RiuNet

Cross-Platform Evaluation for Italian Hate Speech Detection

Author: Cabrio Elena
Corazza Michele
Menini Stefano
Tonelli Sara
Villata Serena
Publication venue: HAL CCSD
Publication date: 13/11/2019
Field of study

INRIA a CCSD electronic archive server

Detección de discurso de odio online utilizando Machine Learning

Author: Shepherd Arévalo Ela Katherine
Publication venue
Publication date: 16/09/2022
Field of study

Trabajo de Fin de Grado en Ingeniería informática, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2021/2022. Enlace al repositorio público del proyecto: https://github.com/NILGroup/TFG-2122HateSpeechDetectionHate speech directed towards marginalized people is a very common problem online, especially in social media such as Twitter or Reddit. Automatically detecting hate speech in such spaces can help mend the Internet and transform it into a safer environment for everybody. Hate speech detection fits into text classification, a series of tasks where text is organized into categories. This project2 proposes using Machine Learning algorithms to detect hate speech in online text in four languages: English, Spanish, Italian and Portuguese. The data to train the models was obtained from online, publicly available datasets. Three different algorithms with varying parameters have been used in order to compare their performance. The experiments show that the best results reach an 82.51% accuracy and around an 83% F1-score, for Italian text. Each language has different results depending on distinct factors.El discurso de odio dirigido a personas marginadas es un problema muy común en línea, especialmente en redes sociales como Twitter o Reddit. La detección automática del discurso de odio en dichos espacios puede ayudar a reparar Internet y a transformarlo en un entorno más seguro para todos. La detección del discurso de odio encaja en la clasificación de texto, donde se organiza en categorías. Este proyecto1 propone el uso de algoritmos de Machine Learning para localizar discurso de odio en textos online en cuatro idiomas: inglés, español, italiano y portugués. Los datos para entrenar los modelos se obtuvieron de datasets disponibles públicamente en línea. Se han utilizado tres algoritmos diferentes con distintos parámetros para comparar su rendimiento. Los experimentos muestran que los mejores resultados alcanzan una precisión del 82,51 % y un valor F1 de alrededor del 83 % en italiano. Los resultados para cada idioma varían dependiendo de distintos factores.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

Docta Complutense

Automatic Detection of Online Jihadist Hate Speech

Author: De Pauw Guy
De Smedt Tom
Van Ostaeyen Pieter
Publication venue
Publication date: 01/01/2018
Field of study

We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen