273 research outputs found

    Brute - Force Sentence Pattern Extortion from Harmful Messages for Cyberbullying Detection

    Get PDF
    Cyberbullying, or humiliating people using the Internet, has existed almost since the beginning ofInternet communication.The relatively recent introduction of smartphones and tablet computers has caused cyberbullying to evolve into a serious social problem. In Japan, members of a parent-teacher association (PTA)attempted to address the problem by scanning the Internet for cyber bullying entries. To help these PTA members and other interested parties confront this difficult task we propose a novel method for automatic detection of malicious Internet content. This method is based on a combinatorial approach resembling brute-force search algorithms, but applied in language classification. The method extracts sophisticated patterns from sentences and uses them in classification. The experiments performed on actual cyberbullying data reveal an advantage of our method vis-à-visprevious methods. Next, we implemented the method into an application forAndroid smartphones to automatically detect possible harmful content in messages. The method performed well in the Android environment, but still needs to be optimized for time efficiency in order to be used in practic

    Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter

    Get PDF
    In this paper we describe the first dataset for the Polish language containing annotations of harmful and toxic language. The dataset was created to study harmful Internet phenomena such as cyberbullying and hate speech, which recently dramatically gain on numbers in Polish Internet as well as worldwide. The dataset was automatically collected from Polish Twitter accounts and annotated by both layperson volunteers under the supervision of a cyberbullying and hate-speech expert. Together with the dataset we propose the first open shared task for Polish to utilize the dataset in classification of such harmful phenomena. In particular, we propose two subtasks: 1) binary classification of harmful and non-harmful tweets, and 2) multiclass classification between two types of harmful information (cyberbullying and hate-speech), and other. The first installment of the shared task became a success by reaching fourteen overall submissions, hence proving a high demand for research applying such data

    Relationship Between Personality Patterns and Harmfulness : Analysis and Prediction Based on Sentence Embedding

    Get PDF
    This paper hypothesizes that harmful utterances need to be judged in the context of whole sentences, and the authors extract features of harmful expressions using a general-purpose language model. Based on the extracted features, the authors propose a method to predict the presence or absence of harmful categories. In addition, the authors believe that it is possible to analyze users who incite others by combining this method with research on analyzing the personality of the speaker from statements on social networking sites. The results confirmed that the proposed method can judge the possibility of harmful comments with higher accuracy than simple dictionary-based models or models using a distributed representation of words. The relationship between personality patterns and harmful expressions was also confirmed by an analysis based on a harmful judgment model

    Cyberbullying Detection on Social Network Services

    Get PDF
    Social networks such as Facebook or Twitter promote the communication between people but they also lead to some excessive uses on the Internet such as cyberbullying for malicious users. In addition, the accessibility of the social network also allows cyberbullying to occur at anytime and evoke more harm from other users’ dissemination. This study collects cyberbullying cases in Twitter and attempts to establish an auto-detection model of cyberbullying tweets base on the text, readability, sentiment score, and other user information to predict the tweets with harassment and ridicule cyberbullying tweets. The novelty of this study is using the readability analysis that has not been considered in past studies to reflect the author\u27s education level, age, and social status. Three data mining techniques, k-nearest neighbors, support vector machine, and decision tree are used in this study to detect the cyberbullying tweets and select the best performance model for cyberbullying prediction

    A review on deep-learning-based cyberbullying detection

    Get PDF
    Bullying is described as an undesirable behavior by others that harms an individual physically, mentally, or socially. Cyberbullying is a virtual form (e.g., textual or image) of bullying or harassment, also known as online bullying. Cyberbullying detection is a pressing need in today’s world, as the prevalence of cyberbullying is continually growing, resulting in mental health issues. Conventional machine learning models were previously used to identify cyberbullying. However, current research demonstrates that deep learning surpasses traditional machine learning algorithms in identifying cyberbullying for several reasons, including handling extensive data, efficiently classifying text and images, extracting features automatically through hidden layers, and many others. This paper reviews the existing surveys and identifies the gaps in those studies. We also present a deep-learning-based defense ecosystem for cyberbullying detection, including data representation techniques and different deep-learning-based models and frameworks. We have critically analyzed the existing DL-based cyberbullying detection techniques and identified their significant contributions and the future research directions they have presented. We have also summarized the datasets being used, including the DL architecture being used and the tasks that are accomplished for each dataset. Finally, several challenges faced by the existing researchers and the open issues to be addressed in the future have been presented

    Cyber bullying identification and tackling using natural language processing techniques

    Get PDF
    Abstract. As offensive content has a detrimental influence on the internet and especially in social media, there has been much research identifying cyberbullying posts from social media datasets. Previous works on this topic have overlooked the problems for cyberbullying categories detection, impact of feature choice, negation handling, and dataset construction. Indeed, many natural language processing (NLP) tasks, including cyberbullying detection in texts, lack comprehensive manually labeled datasets limiting the application of powerful supervised machine learning algorithms, including neural networks. Equally, it is challenging to collect large scale data for a particular NLP project due to the inherent subjectivity of labeling task and man-made effort. For this purpose, this thesis attempts to contribute to these challenges by the following. We first collected and annotated a multi-category cyberbullying (10K) dataset from the social network platform (ask.fm). Besides, we have used another publicly available cyberbullying labeled dataset, ’Formspring’, for comparison purpose and ground truth establishment. We have devised a machine learning-based methodology that uses five distinct feature engineering and six different classifiers. The results showed that CNN classifier with Word-embedding features yielded a maximum performance amidst all state-of-art classifiers, with a detection accuracy of 93\% for AskFm and 92\% for FormSpring dataset. We have performed cyberbullying category detection, and CNN architecture still provide the best performance with 81\% accuracy and 78\% F1-score on average. Our second purpose was to handle the problem of lack of relevant cyberbullying instances in the training dataset through data augmentation. For this end, we developed an approach that makes use of wordsense disambiguation with WordNet-aided semantic expansion. The disambiguation and semantic expansion were intended to overcome several limitations of the social media (SM) posts/comments, such as unstructured content, limited semantic content, among others, while capturing equivalent instances induced by the wordsense disambiguation-based approach. We run several experiments and disambiguation/semantic expansion to estimate the impact of the classification performance using both original and the augmented datasets. Finally, we have compared the accuracy score for cyberbullying detection with some widely used classifiers before and after the development of datasets. The outcome supports the advantage of the data-augmentation strategy, which yielded 99\% of classifier accuracy, a 5\% improvement from the base score of 93\%. Our third goal related to negation handling was motivated by the intuitive impact of negation on cyberbullying statements and detection. Our proposed approach advocates a classification like technique by using NegEx and POS tagging that makes the use of a particular data design procedure for negation detection. Performances using the negation-handling approach and without negation handling are compared and discussed. The result showed a 95\% of accuracy for the negated handed dataset, which corresponds to an overall accuracy improvement of 2\% from the base score of 93\%. Our final goal was to develop a software tool using our machine learning models that will help to test our experiments and provide a real-life example of use case for both end-users and research communities. To achieve this objective, a python based web-application was developed and successfully tested

    PCROD: Context Aware Role based Offensive Detection using NLP/ DL Approaches

    Get PDF
    With the increased use of social media many people misuse online platforms by uploading offensive content and sharing the same with vast audience. Here comes controlling of such offensive contents. In this work we concentrate on the issue of finding offensive text in social media. Existing offensive text detection systems treat weak pejoratives like ‘idiot‘ and extremely indecent pejoratives like ‘f***‘ as same as offensive irrespective of formal and informal contexts . In fact the weakly pejoratives in informal discussions among friends are casual and common which are not offensive but the same can be offensive when expressed in formal discussions. Crucial challenges to accomplish the task of role based offensive detection in text are i) considering the roles while classifying the text as offensive or not i) creating a contextual datasets including both formal and informal roles. To tackle the above mentioned challenges we develop deep neural network based model known as context aware role based offensive detection(CROD). We examine CROD on the manually created dataset that is collected from social networking sites. Results show that CROD gives better performance with RoBERTa with an accuracy of 94% while considering the context and role in data specifics

    Detection of Hate-Speech Tweets Based on Deep Learning: A Review

    Get PDF
    Cybercrime, cyberbullying, and hate speech have all increased in conjunction with the use of the internet and social media. The scope of hate speech knows no bounds or organizational or individual boundaries. This disorder affects many people in diverse ways. It can be harsh, offensive, or discriminating depending on the target's gender, race, political opinions, religious intolerance, nationality, human color, disability, ethnicity, sexual orientation, or status as an immigrant. Authorities and academics are investigating new methods for identifying hate speech on social media platforms like Facebook and Twitter. This study adds to the ongoing discussion about creating safer digital spaces while balancing limiting hate speech and protecting freedom of speech.   Partnerships between researchers, platform developers, and communities are crucial in creating efficient and ethical content moderation systems on Twitter and other social media sites. For this reason, multiple methodologies, models, and algorithms are employed. This study presents a thorough analysis of hate speech in numerous research publications. Each article has been thoroughly examined, including evaluating the algorithms or methodologies used, databases, classification techniques, and the findings achieved.   In addition, comprehensive discussions were held on all the examined papers, explicitly focusing on consuming deep learning techniques to detect hate speech

    Approaches to automated detection of cyberbullying:A Survey

    Get PDF
    Research into cyberbullying detection has increased in recent years, due in part to the proliferation of cyberbullying across social media and its detrimental effect on young people. A growing body of work is emerging on automated approaches to cyberbullying detection. These approaches utilise machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching textual data to the identified traits. In this paper, we present a systematic review of published research (as identified via Scopus, ACM and IEEE Xplore bibliographic databases) on cyberbullying detection approaches. On the basis of our extensive literature review, we categorise existing approaches into 4 main classes, namely; supervised learning, lexicon based, rule based and mixed-initiative approaches. Supervised learning-based approaches typically use classifiers such as SVM and Naïve Bayes to develop predictive models for cyberbullying detection. Lexicon based systems utilise word lists and use the presence of words within the lists to detect cyberbullying. Rules-based approaches match text to predefined rules to identify bullying and mixed-initiatives approaches combine human-based reasoning with one or more of the aforementioned approaches. We found lack of quality representative labelled datasets and non-holistic consideration of cyberbullying by researchers when developing detection systems are two key challenges facing cyberbullying detection research. This paper essentially maps out the state-of-the-art in cyberbullying detection research and serves as a resource for researchers to determine where to best direct their future research efforts in this field

    A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

    Get PDF
    Social media networks are becoming an essential part of life for most of the world’s population. Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers. There is a growing need for automatic detection and mitigation of cyberbullying events on social media. In this study, research directions and the theoretical foundation in this area are investigated. A systematic review of the current state-of-the-art research in this area is conducted. A framework considering all possible actors in the cyberbullying event must be designed, including various aspects of cyberbullying and its effect on the participating actors. Furthermore, future directions and challenges are also discussed
    • …
    corecore