10 research outputs found

    Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

    Full text link
    Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.Comment: 12 pages, 5 figures, 22 tables, Accepted to the 14th International AAAI Conference on Web and Social Media, ICWSM'2

    Detection of Hate Tweets using Machine Learning and Deep Learning

    Get PDF
    Cyberbullying has become a highly problematic occurrence due to its potential of anonymity and its ease for others to join in the harassment of victims. The distancing effect that technological devices have, has led to cyberbullies say and do harsher things compared to what is typical in a traditional face-to-face bullying situation. Given the great importance of the problem, detection is becoming a key area of cyberbullying research. Therefore, it is highly necessary for a framework to accurately detect new cyberbullying instances automatically. To review the machine learning and deep learning approaches, two datasets were used. The first dataset was provided by the University of Maryland consisting of over 30,000 tweets, whereas the second dataset was based on the article `Automated Hate Speech Detection and the Problem of Offensive Language' by Davidson et al., containing roughly 25,000 tweets. The paper explores machine learning approaches using word embeddings such as DBOW (Distributed Bag of Words) and DMM (Distributed Memory Mean) and the performance of Word2vec Convolutional Neural Networks (CNNs) to classify online hate

    A Systematic Literature Review on Cyberbullying in Social Media: Taxonomy, Detection Approaches, Datasets, And Future Research Directions

    Get PDF
    In the area of Natural Language Processing, sentiment analysis, also called opinion mining, aims to extract human thoughts, beliefs, and perceptions from unstructured texts. In the light of social media's rapid growth and the influx of individual comments, reviews and feedback, it has evolved as an attractive, challenging research area. It is one of the most common problems in social media to find toxic textual content.  Anonymity and concealment of identity are common on the Internet for people coming from a wide range of diversity of cultures and beliefs. Having freedom of speech, anonymity, and inadequate social media regulations make cyber toxic environment and cyberbullying significant issues, which require a system of automatic detection and prevention. As far as this is concerned, diverse research is taking place based on different approaches and languages, but a comprehensive analysis to examine them from all angles is lacking. This systematic literature review is therefore conducted with the aim of surveying the research and studies done to date on classification of  cyberbullying based in textual modality by the research community. It states the definition, , taxonomy, properties, outcome of cyberbullying, roles in cyberbullying  along with other forms of bullying and different offensive behavior in social media. This article also shows the latest popular benchmark datasets on cyberbullying, along with their number of classes (Binary/Multiple), reviewing the state-of-the-art methods to detect cyberbullying and abusive content on social media and discuss the factors that drive offenders to indulge in offensive activity, preventive actions to avoid online toxicity, and various cyber laws in different countries. Finally, we identify and discuss the challenges, solutions, additionally future research directions that serve as a reference to overcome cyberbullying in social media

    A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

    Get PDF
    Social media networks are becoming an essential part of life for most of the world’s population. Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers. There is a growing need for automatic detection and mitigation of cyberbullying events on social media. In this study, research directions and the theoretical foundation in this area are investigated. A systematic review of the current state-of-the-art research in this area is conducted. A framework considering all possible actors in the cyberbullying event must be designed, including various aspects of cyberbullying and its effect on the participating actors. Furthermore, future directions and challenges are also discussed

    Ανίχνευση διαδικτυακού εκφοβισμού με χρήση αλγορίθμων μηχανικής μάθησης

    Get PDF
    Η παρούσα διπλωματική εργασία αφορά την εφαρμογή και σύγκριση αλγορίθμων μηχανικής μάθησης για ανάλυση συναισθήματος με σκοπό την ανίχνευση διαδικτυακού εκφοβισμού. Η εφαρμογή των αλγορίθμων πραγματοποιείται σε δύο διαφορετικά σύνολα δεδομένων τα: SOSNet Twitter Dataset και Suspicious Tweets Dataset. Σκοπός της εργασίας αποτέλεσε εκτός από απλή ανίχνευση του διαδικτυακού εκφοβισμού, να πραγματοποιείται περαιτέρω εύρεση του είδους του εκφοβισμού σύμφωνα με συγκεκριμένα κριτήρια όπως η ηλικία, το φύλο, η εθνικότητα κ.λ.π. Επιπρόσθετα, παρουσιάζονται τα γλωσσολογικά στοιχεία των κειμένων της εκάστοτε κατηγορίας, καθώς και αποτελέσματα άλλων ερευνών σχετικά με τη συχνότητα εμφάνισης διαδικτυακού εκφοβισμού ανάλογα με τα υπό μελέτη προσωπικά χαρακτηριστικά του ατόμου. Ως επεκτάσεις της παρούσας μελέτης τίθενται η δημιουργία ενός συστήματος το οποίο θα λαμβάνει υπόψη προσωπικούς άξονες/ κριτήρια όπως φύλο, εθνικότητα, σεξουαλικός προσανατολισμός κ.α για την ανίχνευση του διαδικτυακού εκφοβισμού. Επιπλέον, τα γλωσσολογικά χαρακτηριστικά συγκεντρώνονται ώστε να υπάρξει αναπροσαρμογή της έρευνας και σε ελληνικά δεδομένα. Η πρώτη κρούση για αυτήν την επέκταση λαμβάνει χώρα στην παρούσα μελέτη. Τέλος, περιγράφονται αναλυτικά όλες οι μεθοδολογίες που έχουν υλοποιηθεί σε παρεμφερείς έρευνες καθώς και εκείνη που προτιμάται στην τρέχουσα. Τα αποτελέσματα όλων των αλγορίθμων σε κάθε σύνολο δεδομένων παρατίθενται και σχολιάζονται εκτενώς. Η ανίχνευση του διαδικτυακού εκφοβισμού και η σωστή κατηγοριοποίησή του γίνονται με υψηλή ακρίβεια. Ωστόσο, επισημαίνονται κάποιες μικρές αστοχίες και τίθεται ως μελλοντικός στόχος η δημιουργία νευρωνικού δικτύου για ενδεχόμενη βελτίωση αυτών των αστοχιών.This thesis concerns the application and comparison of machine learning algorithms for sentiment analysis in order to detect cyberbullying. The algorithms are applied to two different datasets: SOSNet Twitter Dataset and Suspicious Tweets Dataset. The purpose of the work was, in addition to simple detection of online bullying, to further find the type of bullying according to specific criteria such as age, gender, nationality, etc. Furthermore, the linguistic elements of the texts of each category are presented, as well as the results of other researches regarding the incidence of cyberbullying according to the personal characteristics of the person under study. As extensions of the present study, the creation of a system which will take into account personal axes/criteria such as gender, nationality, sexual orientation etc. for the detection of online bullying is proposed. In addition, the linguistic features are collected so that the research can be adapted to Greek data as well. The first effort for this extension takes place in the present study. Finally, all the methodologies that have been implemented in similar research are described in detail, as well as the one preferred in the current one. The results of all algorithms on each data set are listed and commented extensively. The detection of cyberbullying and its correct categorization are done with high accuracy. However, some minor failures are pointed out and a future goal is to create a neural network to potentially improve these failures

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    Cyberbullying in educational context

    Get PDF
    Kustenmacher and Seiwert (2004) explain a man’s inclination to resort to technology in his interaction with the environment and society. Thus, the solution to the negative consequences of Cyberbullying in a technologically dominated society is represented by technology as part of the technological paradox (Tugui, 2009), in which man has a dual role, both slave and master, in the interaction with it. In this respect, it is noted that, notably after 2010, there have been many attempts to involve artificial intelligence (AI) to recognize, identify, limit or avoid the manifestation of aggressive behaviours of the CBB type. For an overview of the use of artificial intelligence in solving various problems related to CBB, we extracted works from the Scopus database that respond to the criterion of the existence of the words “cyberbullying” and “artificial intelligence” in the Title, Keywords and Abstract. These articles were the subject of the content analysis of the title and, subsequently, only those that are identified as a solution in the process of recognizing, identifying, limiting or avoiding the manifestation of CBB were kept in the following Table where we have these data synthesized and organized by years
    corecore