10 research outputs found
Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification
Cyberbullying is a pervasive problem in online communities. To identify
cyberbullying cases in large-scale social networks, content moderators depend
on machine learning classifiers for automatic cyberbullying detection. However,
existing models remain unfit for real-world applications, largely due to a
shortage of publicly available training data and a lack of standard criteria
for assigning ground truth labels. In this study, we address the need for
reliable data using an original annotation framework. Inspired by social
sciences research into bullying behavior, we characterize the nuanced problem
of cyberbullying using five explicit factors to represent its social and
linguistic aspects. We model this behavior using social network and
language-based features, which improve classifier performance. These results
demonstrate the importance of representing and modeling cyberbullying as a
social phenomenon.Comment: 12 pages, 5 figures, 22 tables, Accepted to the 14th International
AAAI Conference on Web and Social Media, ICWSM'2
Detection of Hate Tweets using Machine Learning and Deep Learning
Cyberbullying has become a highly problematic occurrence due to its potential of anonymity and its ease for others to join in the harassment of victims. The distancing effect that technological devices have, has led to cyberbullies say and do harsher things compared to what is typical in a traditional face-to-face bullying situation. Given the great importance of the problem, detection is becoming a key area of cyberbullying research. Therefore, it is highly necessary for a framework to accurately detect new cyberbullying instances automatically. To review the machine learning and deep learning approaches, two datasets were used. The first dataset was provided by the University of Maryland consisting of over 30,000 tweets, whereas the second dataset was based on the article `Automated Hate Speech Detection and the Problem of Offensive Language' by Davidson et al., containing roughly 25,000 tweets. The paper explores machine learning approaches using word embeddings such as DBOW (Distributed Bag of Words) and DMM (Distributed Memory Mean) and the performance of Word2vec Convolutional Neural Networks (CNNs) to classify online hate
A Systematic Literature Review on Cyberbullying in Social Media: Taxonomy, Detection Approaches, Datasets, And Future Research Directions
In the area of Natural Language Processing, sentiment analysis, also called opinion mining, aims to extract human thoughts, beliefs, and perceptions from unstructured texts. In the light of social media's rapid growth and the influx of individual comments, reviews and feedback, it has evolved as an attractive, challenging research area. It is one of the most common problems in social media to find toxic textual content. Anonymity and concealment of identity are common on the Internet for people coming from a wide range of diversity of cultures and beliefs. Having freedom of speech, anonymity, and inadequate social media regulations make cyber toxic environment and cyberbullying significant issues, which require a system of automatic detection and prevention. As far as this is concerned, diverse research is taking place based on different approaches and languages, but a comprehensive analysis to examine them from all angles is lacking. This systematic literature review is therefore conducted with the aim of surveying the research and studies done to date on classification of cyberbullying based in textual modality by the research community. It states the definition, , taxonomy, properties, outcome of cyberbullying, roles in cyberbullying along with other forms of bullying and different offensive behavior in social media. This article also shows the latest popular benchmark datasets on cyberbullying, along with their number of classes (Binary/Multiple), reviewing the state-of-the-art methods to detect cyberbullying and abusive content on social media and discuss the factors that drive offenders to indulge in offensive activity, preventive actions to avoid online toxicity, and various cyber laws in different countries. Finally, we identify and discuss the challenges, solutions, additionally future research directions that serve as a reference to overcome cyberbullying in social media
A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges
Social media networks are becoming an essential part of life for most of the world’s population. Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers. There is a growing need for automatic detection and mitigation of cyberbullying events on social media. In this study, research directions and the theoretical foundation in this area are investigated. A systematic review of the current state-of-the-art research in this area is conducted. A framework considering all possible actors in the cyberbullying event must be designed, including various aspects of cyberbullying and its effect on the participating actors. Furthermore, future directions and challenges are also discussed
Ανίχνευση διαδικτυακού εκφοβισμού με χρήση αλγορίθμων μηχανικής μάθησης
Η παρούσα διπλωματική εργασία αφορά την εφαρμογή και σύγκριση αλγορίθμων μηχανικής μάθησης για ανάλυση συναισθήματος με σκοπό την ανίχνευση διαδικτυακού εκφοβισμού. Η εφαρμογή των αλγορίθμων πραγματοποιείται σε δύο διαφορετικά σύνολα δεδομένων τα: SOSNet Twitter Dataset και Suspicious Tweets Dataset.
Σκοπός της εργασίας αποτέλεσε εκτός από απλή ανίχνευση του διαδικτυακού εκφοβισμού, να πραγματοποιείται περαιτέρω εύρεση του είδους του εκφοβισμού σύμφωνα με συγκεκριμένα κριτήρια όπως η ηλικία, το φύλο, η εθνικότητα κ.λ.π. Επιπρόσθετα, παρουσιάζονται τα γλωσσολογικά στοιχεία των κειμένων της εκάστοτε κατηγορίας, καθώς και αποτελέσματα άλλων ερευνών σχετικά με τη συχνότητα εμφάνισης διαδικτυακού εκφοβισμού ανάλογα με τα υπό μελέτη προσωπικά χαρακτηριστικά του ατόμου. Ως επεκτάσεις της παρούσας μελέτης τίθενται η δημιουργία ενός συστήματος το οποίο θα λαμβάνει υπόψη προσωπικούς άξονες/ κριτήρια όπως φύλο, εθνικότητα, σεξουαλικός προσανατολισμός κ.α για την ανίχνευση του διαδικτυακού εκφοβισμού. Επιπλέον, τα γλωσσολογικά χαρακτηριστικά συγκεντρώνονται ώστε να υπάρξει αναπροσαρμογή της έρευνας και σε ελληνικά δεδομένα. Η πρώτη κρούση για αυτήν την επέκταση λαμβάνει χώρα στην παρούσα μελέτη.
Τέλος, περιγράφονται αναλυτικά όλες οι μεθοδολογίες που έχουν υλοποιηθεί σε παρεμφερείς έρευνες καθώς και εκείνη που προτιμάται στην τρέχουσα. Τα αποτελέσματα όλων των αλγορίθμων σε κάθε σύνολο δεδομένων παρατίθενται και σχολιάζονται εκτενώς. Η ανίχνευση του διαδικτυακού εκφοβισμού και η σωστή κατηγοριοποίησή του γίνονται με υψηλή ακρίβεια. Ωστόσο, επισημαίνονται κάποιες μικρές αστοχίες και τίθεται ως μελλοντικός στόχος η δημιουργία νευρωνικού δικτύου για ενδεχόμενη βελτίωση αυτών των αστοχιών.This thesis concerns the application and comparison of machine learning algorithms for sentiment analysis in order to detect cyberbullying. The algorithms are applied to two different datasets: SOSNet Twitter Dataset and Suspicious Tweets Dataset.
The purpose of the work was, in addition to simple detection of online bullying, to further find the type of bullying according to specific criteria such as age, gender, nationality, etc. Furthermore, the linguistic elements of the texts of each category are presented, as well as the results of other researches regarding the incidence of cyberbullying according to the personal characteristics of the person under study. As extensions of the present study, the creation of a system which will take into account personal axes/criteria such as gender, nationality, sexual orientation etc. for the detection of online bullying is proposed. In addition, the linguistic features are collected so that the research can be adapted to Greek data as well. The first effort for this extension takes place in the present study.
Finally, all the methodologies that have been implemented in similar research are described in detail, as well as the one preferred in the current one. The results of all algorithms on each data set are listed and commented extensively. The detection of cyberbullying and its correct categorization are done with high accuracy. However, some minor failures are pointed out and a future goal is to create a neural network to potentially improve these failures
Mapping (Dis-)Information Flow about the MH17 Plane Crash
Digital media enables not only fast sharing of information, but also
disinformation. One prominent case of an event leading to circulation of
disinformation on social media is the MH17 plane crash. Studies analysing the
spread of information about this event on Twitter have focused on small,
manually annotated datasets, or used proxys for data annotation. In this work,
we examine to what extent text classifiers can be used to label data for
subsequent content analysis, in particular we focus on predicting pro-Russian
and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though
we find that a neural classifier improves over a hashtag based baseline,
labeling pro-Russian and pro-Ukrainian content with high precision remains a
challenging problem. We provide an error analysis underlining the difficulty of
the task and identify factors that might help improve classification in future
work. Finally, we show how the classifier can facilitate the annotation task
for human annotators
Cyberbullying in educational context
Kustenmacher and Seiwert (2004) explain a man’s inclination to resort to technology in his interaction with the environment and society. Thus, the solution to the negative consequences of Cyberbullying in a technologically dominated society is represented by technology as part of the technological paradox (Tugui, 2009), in which man has a dual role, both slave and master, in the interaction with it. In this respect, it is noted that, notably after 2010, there have been many attempts to involve artificial intelligence (AI) to recognize, identify, limit or avoid the manifestation of aggressive behaviours of the CBB type. For an overview of the use of artificial intelligence in solving various problems related to CBB, we extracted works from the Scopus database that respond to the criterion of the existence of the words “cyberbullying” and “artificial intelligence” in the Title, Keywords and Abstract. These articles were the subject of the content analysis of the title and, subsequently, only those that are identified as a solution in the process of recognizing, identifying, limiting or avoiding the manifestation of CBB were kept in the following Table where we have these data synthesized and organized by years