4 research outputs found

    A Socio-contextual Approach in Automated Detection of Cyberbullying

    Get PDF
    Cyberbullying is a major cyber issue that is common among adolescents. Recent reports show that more than one out of five students in the United States is a victim of cyberbullying. Majority of cyberbullying incidents occur on public social media platforms such as Twitter. Automated cyberbullying detection methods can help prevent cyberbullying before the harm is done on the victim. In this study, we analyze a corpus of cyberbullying Tweets to construct an automated detection model. Our method emphasizes on the two claims that are supported by our results. First, despite other approaches that assume that cyberbullying instances use vulgar or profane words, we show that they do not necessarily contain negative words. Second, we highlight the importance of context and the characteristics of actors involved and their position in the network structure in detecting cyberbullying rather than only considering the textual content in our analysis

    Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

    Get PDF
    The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field

    Approaches to automated detection of cyberbullying:A Survey

    Get PDF
    Research into cyberbullying detection has increased in recent years, due in part to the proliferation of cyberbullying across social media and its detrimental effect on young people. A growing body of work is emerging on automated approaches to cyberbullying detection. These approaches utilise machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching textual data to the identified traits. In this paper, we present a systematic review of published research (as identified via Scopus, ACM and IEEE Xplore bibliographic databases) on cyberbullying detection approaches. On the basis of our extensive literature review, we categorise existing approaches into 4 main classes, namely; supervised learning, lexicon based, rule based and mixed-initiative approaches. Supervised learning-based approaches typically use classifiers such as SVM and Naïve Bayes to develop predictive models for cyberbullying detection. Lexicon based systems utilise word lists and use the presence of words within the lists to detect cyberbullying. Rules-based approaches match text to predefined rules to identify bullying and mixed-initiatives approaches combine human-based reasoning with one or more of the aforementioned approaches. We found lack of quality representative labelled datasets and non-holistic consideration of cyberbullying by researchers when developing detection systems are two key challenges facing cyberbullying detection research. This paper essentially maps out the state-of-the-art in cyberbullying detection research and serves as a resource for researchers to determine where to best direct their future research efforts in this field

    Fuzzy-based machine learning for predicting narcissistic traits among Twitter users.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Social media has provided a platform for people to share views and opinions they identify with or which are significant to them. Similarly, social media enables individuals to express themselves authentically and divulge their personal experiences in a variety of ways. This behaviour, in turn, reflects the user’s personality. Social media has in recent times been used to perpetuate various forms of crimes, and a narcissistic personality trait has been linked to violent criminal activities. This negative side effect of social media calls for multiple ways to respond and prevent damage instigated. Eysenck's theory on personality and crime postulated that various forms of crime are caused by a mixture of environmental and neurological causes. This theory suggests certain people are more likely to commit a crime, and personality is the principal factor in criminal behaviour. Twitter is a widely used social media platform for sharing news, opinions, feelings, and emotions by users. Given that narcissists have an inflated self-view and engage in a variety of strategies aimed at bringing attention to themselves, features unique to Twitter are more appealing to narcissists than those on sites such as Facebook. This study adopted design science research methodology to develop a fuzzy-based machine learning predictive model to identify traces of narcissism from Twitter using data obtained from the activities of a user. Performance evaluation of various classifiers was conducted and an optimal classifier with 95% accuracy was obtained. The research found that the size of the dataset and input variables have an influence on classifier accuracy. In addition, the research developed an updated process model and recommended a research model for narcissism classification