511 research outputs found

    Learning like human annotators: Cyberbullying detection in lengthy social media sessions

    Get PDF
    The inherent characteristic of cyberbullying of being a recurrent attitude calls for the investigation of the problem by looking at social media sessions as a whole, beyond just isolated social media posts. However, the lengthy nature of social media sessions challenges the applicability and performance of session-based cyberbullying detection models. This is especially true when one aims to use state-of-the-art Transformer-based pre-trained language models, which only take inputs of a limited length. In this paper, we address this limitation of transformer models by proposing a conceptually intuitive framework called LS-CB, which enables cyberbullying detection from lengthy social media sessions. LS-CB relies on the intuition that we can effectively aggregate the predictions made by transformer models on smaller sliding windows extracted from lengthy social media sessions, leading to an overall improved performance. Our extensive experiments with six transformer models on two session-based datasets show that LS-CB consistently outperforms three types of competitive baselines including state-of-the-art cyberbullying detection models. In addition, we conduct a set of qualitative analyses to validate the hypotheses that cyberbullying incidents can be detected through aggregated analysis of smaller chunks derived from lengthy social media sessions (H1), and that cyberbullying incidents can occur at different points of the session (H2), hence positing that frequently used text truncation strategies are suboptimal compared to relying on holistic views of sessions. Our research in turn opens an avenue for fine-grained cyberbullying detection within sessions in future work

    Analyzing and Learning the Language for Different Types of Harassment

    Get PDF
    THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features

    Sociolingustics Approach: Impoliteness Strategy in Instagram Cyberbullying in @Lambe_Turah’s post of KPAI’s Case

    Get PDF
    Social media is used by most Indonesian people, it brings good and bad influence. One of the bad influences is cyberbullying. Cyberbullying that is found in Instagram is different from other social media. Instagram has become the source of hate campaign by the occurrence of gossip account. @Lambe_turah is one of the biggest gossip accounts in Indonesia. The posts in @Lambe_turah have triggered its followers to perform cyberbullying to the person in their post, in this research is about KPAI. The aim of this study is to investigate the comments which contain cyberbullying by using impoliteness strategy proposed by Culpeper (2005). This study uses qualitative approach. It is shown by the data collection method which is done by examining documents. The analysis of this study is done by interpreting the collected data. The result shows that negative impoliteness is the most common impoliteness strategy to be used in cyberbullying. The second impoliteness strategy that is used in cyberbullying is bald on record. From both of these strategies, it can be concluded that cyberbullying tend to attack the addressee directly by using various types of statements.   Abstrak Media sosial digunakan oleh hampir seluruh masyarakat Indonesia, ini bisa membawa pengaruh baik dan buruk bagi penggunanya. Salah satu pengaruh buruknya adalah cyberbullying perundungan secara online). Cyberbullying yang ditemukan di Instagram berbeda dengan media sosial yang lain. Instagram menjadi sumber dari kampanye kebencian yang berasal dari akun gosip. @Lambe_turah adalah salah satu akun gosip terbesar di Indonesia. Unggahan di akun @Lambe_turah telah memicu para pengikutnya atau yang disebut followers untuk melakukan cyberbullying kepada orang yang ada pada unggahan terkait, dalam penelitian ini penulis menggunakan KPAI. Tujuan dari penelitian ini ada untuk menginvestigasi komen-komen yang mengandung cyberbullying dengan menggunakan strategi ketidaksopana yang digagas oleh Culpeper (2005). Penelitian ini menggunakan pendekatan kualitatif, ini terlihat dari metode pengumpulan data yang dilakukan dengan meneliti dokumen. Analisis dari penelitian ini dilakukan dengan menginterpretasi data-data yang sudah didapat. Hasil dari penelitian ini menunjukkan bahwa tipe ketidaksopanan negatif adalah strategi ketidaksopanan yang paling sering digunakan pada cyberbullying, dan yang kedua adalah secara langsung. Dari teratas kedua yang digunakan dalam cyberbullying, bisa disimpulkan bahwa pelaku penindasan di media sosial lebih banyak melakukan penyerangan langsung dengan menggunakan berbagai macam tipe umpatan.   Kata kunci: cyberbullying, sosiolinguistik, strategi ketidaksopana

    Towards Cyberbullying-free social media in smart cities: a unified multi-modal approach

    Get PDF
    YesSmart cities are shifting the presence of people from physical world to cyber world (cyberspace). Along with the facilities for societies, the troubles of physical world, such as bullying, aggression and hate speech, are also taking their presence emphatically in cyberspace. This paper aims to dig the posts of social media to identify the bullying comments containing text as well as image. In this paper, we have proposed a unified representation of text and image together to eliminate the need for separate learning modules for image and text. A single-layer Convolutional Neural Network model is used with a unified representation. The major findings of this research are that the text represented as image is a better model to encode the information. We also found that single-layer Convolutional Neural Network is giving better results with two-dimensional representation. In the current scenario, we have used three layers of text and three layers of a colour image to represent the input that gives a recall of 74% of the bullying class with one layer of Convolutional Neural Network.Ministry of Electronics and Information Technology (MeitY), Government of Indi

    Early Detection of Cyberbullying on Social Media Networks

    Get PDF
    [Abstract] Cyberbullying is an important issue for our society and has a major negative effect on the victims, that can be highly damaging due to the frequency and high propagation provided by Information Technologies. Therefore, the early detection of cyberbullying in social networks becomes crucial to mitigate the impact on the victims. In this article, we aim to explore different approaches that take into account the time in the detection of cyberbullying in social networks. We follow a supervised learning method with two different specific early detection models, named threshold and dual. The former follows a more simple approach, while the latter requires two machine learning models. To the best of our knowledge, this is the first attempt to investigate the early detection of cyberbullying. We propose two groups of features and two early detection methods, specifically designed for this problem. We conduct an extensive evaluation using a real world dataset, following a time-aware evaluation that penalizes late detections. Our results show how we can improve baseline detection models up to 42%.This research was supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the European Union (Project PID2019-111388GB-I00) and by the Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia (Galicia, Spain) and the European Union (European Regional Development Fund — Galicia 2014–2020 Program) , by grant ED431G 2019/01Xunta de Galicia; ED431G 2019/0
    corecore