Search CORE

8 research outputs found

Comparative Studies of Detecting Abusive Language on Twitter

Author: Jung Kyomin
Lee Younghun
Yoon Seunghyun
Publication venue
Publication date: 01/01/2018
Field of study

The context-dependent nature of online aggression makes annotating large collections of data extremely difficult. Previously studied datasets in abusive language detection have been insufficient in size to efficiently train deep learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much greater in size and reliability, has been released. However, this dataset has not been comprehensively studied to its potential. In this paper, we conduct the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and discuss the possibility of using additional features and context data for improvements. Experimental results show that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model scoring 0.805 F1.Comment: ALW2: 2nd Workshop on Abusive Language Online to be held at EMNLP 2018 (Brussels, Belgium), October 31st, 201

arXiv.org e-Print Archive

Crossref

A comparison of classification models to detect cyberbullying in the Peruvian Spanish language on twitter

Author: Ayma Quirita Victor Hugo
Cuzcano Chavez Ximena Marianne
Publication venue: 'Indiana University Center for Genomics and Bioinformatics (CGB)'
Publication date: 01/01/2020
Field of study

Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier

Repositorio Institucional Ulima

A comparison of classification models to detect cyberbullying in the peruvian spanish language on Twitter

Author: Cuzcano Chavez Ximena Marianne
Publication venue: 'Dipartimento di Economia, Universita di Perugia (IT)'
Publication date: 01/01/2020
Field of study

Repositorio Institucional Ulima

Improving hate speech detection using machine and deep learning techniques: A preliminary study

Author: Ahmed Siddiqui Jawaid
Amin Yumna
Memon Zulfiqar Ali
Yuhaniz Siti Sophiayati
Publication venue: 'Penerbit UTM Press'
Publication date: 01/01/2021
Field of study

The increasing use of social media and information sharing has given major benefits to humanity. However, this has also given rise to a variety of challenges including the spreading and sharing of hate speech messages. Thus, to solve this emerging issue in social media, recent studies employed a variety of feature engineering techniques and machine learning or deep learning algorithms to automatically detect the hate speech messages on different datasets. However, most of the studies classify the hate speech related message using existing feature engineering approaches and suffer from the low classification results. This is because, the existing feature engineering approaches suffer from the word order problem and word context problem. In this research, identifying hateful content from latest tweets of twitter and classify them into several categories is studied. The categories identified are; Ethnicity, Nationality, Religion, Gender, Sexual Orientation, Disability and Other. These categories are further classified to identify the targets of hate speech such as Black, White, Asian belongs to Ethnicity and Muslims, Jews, Christians can be classified from Religion Category. An evaluation will be performed among the hateful content identified using deep learning model LSTM and traditional machine learning models which includes Linear SVC, Logistic Regression, Random Forest and Multinomial Nai¨ve Bayes to measure their accuracy and precision and their comparison on the live extracted tweets from twitter which will be used as our test dataset

Universiti Teknologi Malaysia Institutional Repository

Operation Heron – Latent topic changes in an abusive letter series

Author: Atkins Sarah
Busso Lucia
Grant Tim
Petyko Marton
Publication venue
Publication date: 26/03/2021
Field of study

The paper presents a two-part forensic linguistic analysis of an historic collection of abuse letters, sent to individuals in the public eye and individuals’ private homes between 2007-2009. We employ the technique of structural topic modelling (STM) to identify distinctions in the core topics of the letters, gauging the value of this relatively underused methodology in forensic linguistics. Four key topics were identified in the letters, Politics A and B, Healthcare, and Immigration, and their coherence, correlation and shifts in topic evaluated. Following the STM, a qualitative corpus linguistic analysis was undertaken, coding concordance lines according to topic, with the reliability between coders tested. This coding demonstrated that various connected statements within the same topic tend to gain or lose prevalence over time, and ultimately confirmed the consistency of content within the four topics identified through STM throughout the letter series. The discussion and conclusions to the paper reflect on the findings as well as considering the utility of these methodologies for linguistics and forensic linguistics in particular. The study demonstrates real value in revisiting a forensic linguistic dataset such as this to test and develop methodologies for the fiel

Aston Publications Explorer

Linguistic variation across Twitter and Twitter trolling

Author: Clarke Isobelle
Publication venue
Publication date: 01/07/2020
Field of study

Trolling is used to label a variety of behaviours, from the spread of misinformation and hyperbole to targeted abuse and malicious attacks. Despite this, little is known about how trolling varies linguistically and what its major linguistic repertoires and communicative functions are in comparison to general social media posts. Consequently, this dissertation collects two corpora of tweets – a general English Twitter corpus and a Twitter trolling corpus using other Twitter users’ accusations – and introduces and applies a new short-text version of Multi-Dimensional Analysis to each corpus, which is designed to identify aggregated dimensions of linguistic variation across them. The analysis finds that trolling tweets and general tweets only differ on the final dimension of linguistic variation, but share the following linguistic repertoires: “Informational versus Interactive”, “Personal versus Other Description”, and “Promotional versus Oppositional”. Moreover, the analysis compares trolling tweets to general Twitter’s dimensions and finds that trolling tweets and general tweets are remarkably more similar than they are different in their distribution along all dimensions. These findings counter various theories on trolling and problematise the notion that trolling can be detected automatically using grammatical variation. Overall, this dissertation provides empirical evidence on how trolling and general tweets vary linguistically

University of Birmingham Research Archive, E-theses Repository