185 research outputs found

    Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

    Get PDF
    The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field

    A Systematic Literature Review on Cyberbullying in Social Media: Taxonomy, Detection Approaches, Datasets, And Future Research Directions

    Get PDF
    In the area of Natural Language Processing, sentiment analysis, also called opinion mining, aims to extract human thoughts, beliefs, and perceptions from unstructured texts. In the light of social media's rapid growth and the influx of individual comments, reviews and feedback, it has evolved as an attractive, challenging research area. It is one of the most common problems in social media to find toxic textual content.  Anonymity and concealment of identity are common on the Internet for people coming from a wide range of diversity of cultures and beliefs. Having freedom of speech, anonymity, and inadequate social media regulations make cyber toxic environment and cyberbullying significant issues, which require a system of automatic detection and prevention. As far as this is concerned, diverse research is taking place based on different approaches and languages, but a comprehensive analysis to examine them from all angles is lacking. This systematic literature review is therefore conducted with the aim of surveying the research and studies done to date on classification of  cyberbullying based in textual modality by the research community. It states the definition, , taxonomy, properties, outcome of cyberbullying, roles in cyberbullying  along with other forms of bullying and different offensive behavior in social media. This article also shows the latest popular benchmark datasets on cyberbullying, along with their number of classes (Binary/Multiple), reviewing the state-of-the-art methods to detect cyberbullying and abusive content on social media and discuss the factors that drive offenders to indulge in offensive activity, preventive actions to avoid online toxicity, and various cyber laws in different countries. Finally, we identify and discuss the challenges, solutions, additionally future research directions that serve as a reference to overcome cyberbullying in social media

    Comparative Performance of Data Mining Techniques for Cyberbullying Detection of Arabic Social Media Text

    Get PDF
    Cyberbullying has spread like a virus on social media platforms and is getting out of control. According to psychological studies on the subject, the victims are increasingly suffering, sometimes to the point of committing suicide among the victims. The issue of cyberbullying on social media is spreading around the world. Social media use is growing, and it can have useful and negative implications when you take into account how social media platforms are abused through different forms of cyberbullying. Although there is a lot of cyberbullying detection in English, there are few studies in the Arabic language. Data Mining techniques are often used to solve and detect this problem. In this study, different data mining algorithms were used to detect cyberbullying in Arabic texts.. Our study was conducted The Bullying datasets consisted of 26,000 comments written in Arabic and were collected from kaggle.com, the Cyber_2021 dataset consisted of 13,247 comments collected via github.com, and the Data 2022 dataset consisted of 47,224 comments collected via Instagram. Various extraction features CountVectorizer and Tf-Idf were used Accuracy, precision, recall, and the F1 score were used to evaluate classifier performance. In the study, Bagging Classifier achieve high results of Bullying dataset from Kaggle Accuracy 96.04, F1-Score 95.98, Recall 96.04, Precision 95.95, SVC model gave the highest results of  Cyber_2021 dataset from Github an Accuracy 98.49, F1-Score 98.49, Recall 98.49, Precision 98.50, while Data 2022 dataset from (Instagram) achieving an Accuracy of 77.51, F1-Score 76.60, Recall 77.51, and Precision 77.24. Were achieved for Tf-Idf Vectorizer. Tf-Idf  Vectorizer the best to all results than count Vectorizer

    A Survey on Cybercrime Using Social Media

    Get PDF
    There is growing interest in automating crime detection and prevention for large populations as a result of the increased usage of social media for victimization and criminal activities. This area is frequently researched due to its potential for enabling criminals to reach a large audience. While several studies have investigated specific crimes on social media, a comprehensive review paper that examines all types of social media crimes, their similarities, and detection methods is still lacking. The identification of similarities among crimes and detection methods can facilitate knowledge and data transfer across domains. The goal of this study is to collect a library of social media crimes and establish their connections using a crime taxonomy. The survey also identifies publicly accessible datasets and offers areas for additional study in this area

    Sentiment Analysis for Social Media

    Get PDF
    Sentiment analysis is a branch of natural language processing concerned with the study of the intensity of the emotions expressed in a piece of text. The automated analysis of the multitude of messages delivered through social media is one of the hottest research fields, both in academy and in industry, due to its extremely high potential applicability in many different domains. This Special Issue describes both technological contributions to the field, mostly based on deep learning techniques, and specific applications in areas like health insurance, gender classification, recommender systems, and cyber aggression detection

    Building Towards Automated Cyberbullying Detection: A Comparative Analysis

    Get PDF
    The increased use of social media between digitally anonymous users, sharing their thoughts and opinions, can facilitate participation and collaboration. However, it’s this anonymity feature which gives users freedom of speech and allows them to conduct activities without being judged by others can also encourage cyberbullying and hate speech. Predators can hide their identity and reach a wide range of audience anytime and anywhere. According to the detrimental effect of cyberbullying, there is a growing need for cyberbullying detection approaches. In this survey paper, a comparative analysis of the automated cyberbullying techniques from different perspectives is discussed including data annotation, data pre-processing and feature engineering. In addition, the importance of emojis in expressing emotions as well as their influence on sentiment classification and text comprehension lead us to discuss the role of incorporating emojis in the process of cyberbullying detection and their influence on the detection performance. Furthermore, the different domains for using Self-Supervised Learning (SSL) as an annotation technique for cyberbullying detection is explored

    Natural Language Processing for Cyberbullying Detection

    Get PDF
    With the development of digital technologies and the popularity of social media, cyberbullying has become a serious public health concern that can lead to increased risk of mental and behavioral health issues or even suicide. Artificial intelligence like machine learning opens a lot of possibilities to combat cyberbullying, e.g. automatic cyberbullying detection. Most recent research focuses on improving performance by developing complex models that demand more resources and time to run. The research uses publicly available datasets without carefully evaluating their feasibility and limitations. This study uses natural language processing (NLP) to evaluate the model performance and examine the difference between fine-grained classification and binary classification as well as assess the feasibility and quality of the publicly available dataset. The results show that simple classifier can also achieve similar performance as that of more complex models if appropriate preprocessing is used, and the publicly available dataset may have limitations and quality issues that researchers should consider when using the data
    • …
    corecore