82 research outputs found

    A Systematic Literature Review on Cyberbullying in Social Media: Taxonomy, Detection Approaches, Datasets, And Future Research Directions

    Get PDF
    In the area of Natural Language Processing, sentiment analysis, also called opinion mining, aims to extract human thoughts, beliefs, and perceptions from unstructured texts. In the light of social media's rapid growth and the influx of individual comments, reviews and feedback, it has evolved as an attractive, challenging research area. It is one of the most common problems in social media to find toxic textual content.  Anonymity and concealment of identity are common on the Internet for people coming from a wide range of diversity of cultures and beliefs. Having freedom of speech, anonymity, and inadequate social media regulations make cyber toxic environment and cyberbullying significant issues, which require a system of automatic detection and prevention. As far as this is concerned, diverse research is taking place based on different approaches and languages, but a comprehensive analysis to examine them from all angles is lacking. This systematic literature review is therefore conducted with the aim of surveying the research and studies done to date on classification of  cyberbullying based in textual modality by the research community. It states the definition, , taxonomy, properties, outcome of cyberbullying, roles in cyberbullying  along with other forms of bullying and different offensive behavior in social media. This article also shows the latest popular benchmark datasets on cyberbullying, along with their number of classes (Binary/Multiple), reviewing the state-of-the-art methods to detect cyberbullying and abusive content on social media and discuss the factors that drive offenders to indulge in offensive activity, preventive actions to avoid online toxicity, and various cyber laws in different countries. Finally, we identify and discuss the challenges, solutions, additionally future research directions that serve as a reference to overcome cyberbullying in social media

    Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and its Effects on the Web

    Get PDF
    The discussion-board site 4chan has been part of the Internet's dark underbelly since its inception, and recent political events have put it increasingly in the spotlight. In particular, /pol/, the “Politically Incorrect'” board, has been a central figure in the outlandish 2016 US election season, as it has often been linked to the alt-right movement and its rhetoric of hate and racism. However, 4chan remains relatively unstudied by the scientific community: little is known about its user base, the content it generates, and how it affects other parts of the Web. In this paper, we start addressing this gap by analyzing /pol/ along several axes, using a dataset of over 8M posts we collected over two and a half months. First, we perform a general characterization, showing that /pol/ users are well distributed around the world and that 4chan's unique features encourage fresh discussions. We also analyze content, finding, for instance, that YouTube links and hate speech are predominant on /pol/. Overall, our analysis not only provides the first measurement study of /pol/, but also insight into online harassment and hate speech trends in social media

    Online discussions through the lens of interaction patterns

    Get PDF
    Computer-mediated communication is arguably prevailing over face-to-face. However, many of the subtleties that make in-person communication personal, cues such as an ironic tone of voice or an effortless posture, are inherently impossible to render through a screen. The context vanishes from the conversation - what is left is therefore mostly text, enlivened by occasional multimedia. At least, this seems the dominant opinion of both industry and academia, that recently focused considerable resources on a deeper understanding of natural and visual language. I argue instead that richer cues are missing from online interaction only because current applications do not acknowledge them -- indeed, communication online is already infused with nonverbal codes, and the effort needed to leverage them is well worth the amount of information they carry. This dissertation therefore focuses on what is left out of the traditional definition of content: I refer to these aspects of communication as content-agnostic. Specifically, this dissertation makes three contributions. First, I formalize what constitutes content-agnostic information in computer-mediated communication, and prove content-agnostic information is as personal to each user as its offline counterpart. For this reason, I choose as a venue of research the web forum, a supposedly text-based, impersonal communication environment, and show that it is possible to attribute a message to the corresponding author solely on the basis of its content-agnostic features -- in other words, without looking at the content of the message at all. Next, I display how abundant and how varied is the content-agnostic information that lies untapped in current applications.To this end, I analyze the content-agnostic aspects of one type of interaction, the quote, and draw conclusions on how these may support discussion, signal user status, mark relationships between users, and characterize the discussion forum as a community. One interesting implication is that discussion platforms may not need to introduce new features for supporting social signals, and conversely social networks may better integrate discussion by enhancing its content-agnostic qualities. Finally, I demonstrate how content-agnostic information reveals user behavior. I focus specifically on trolls, malicious users that disrupt communities through deceptive or manipulative actions. In fact, the language of trolls blends in with that of civil users in heated discussions, which makes collecting irrefutable evidence of trolling difficult even for human moderators. Nonetheless, I show that a combination of content-agnostic and linguistic features sets apart discussions that will eventually be trolled, and reactions to trolling posts. This provides evidence of how content-agnostic information can offer a point of view on user behavior that is at the same time different from, and complementary to, that offered by the actual content of the contribution. Popular up and coming platforms, such as Snapchat, Tumblr, or Yik Yak, are increasingly abandoning persistent, threaded, text-based discussion, in favor of ephemeral, loosely structured, mixed-media content. Although the results of this dissertation are mostly drawn from discussion forums, its research frame and methods should apply directly to these other venues, and to a broad range of communication paradigms. Also, this is but a preliminary step towards a fuller understanding of what additional cues can or should complement content to overcome the limitations of computer-mediated communication

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset

    Full text link
    The discourse around conspiracy theories is currently thriving amidst the rampant misinformation prevalent in online environments. Research in this field has been focused on detecting conspiracy theories on social media, often relying on limited datasets. In this study, we present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throughout the year 2022. Our approach centers on data collection that is independent of specific conspiracy theories and information operations. Additionally, our dataset includes a control group comprising randomly selected users who can be fairly compared to the individuals involved in conspiracy activities. This comprehensive collection effort yielded a total of 15K accounts and 37M tweets extracted from their timelines. We conduct a comparative analysis of the two groups across three dimensions: topics, profiles, and behavioral characteristics. The results indicate that conspiracy and control users exhibit similarity in terms of their profile metadata characteristics. However, they diverge significantly in terms of behavior and activity, particularly regarding the discussed topics, the terminology used, and their stance on trending subjects. Interestingly, there is no significant disparity in the presence of bot users between the two groups, suggesting that conspiracy and automation are orthogonal concepts. Finally, we develop a classifier to identify conspiracy users using 93 features, some of which are commonly employed in literature for troll identification. The results demonstrate a high accuracy level (with an average F1 score of 0.98%), enabling us to uncover the most discriminative features associated with conspiracy-related accounts

    Trollthrottle -- Raising the Cost of Astroturfing

    Get PDF
    Astroturfing, i.e., the fabrication of public discourse by private or state-controlled sponsors via the creation of fake online accounts, has become incredibly widespread in recent years. It gives a disproportionally strong voice to wealthy and technology-savvy actors, permits targeted attacks on public forums and could in the long run harm the trust users have in the internet as a communication platform. Countering these efforts without deanonymising the participants has not yet proven effective; however, we can raise the cost of astroturfing. Following the principle `one person, one voice', we introduce Trollthrottle, a protocol that limits the number of comments a single person can post on participating websites. Using direct anonymous attestation and a public ledger, the user is free to choose any nickname, but the number of comments is aggregated over all posts on all websites, no matter which nickname was used. We demonstrate the deployability of Trollthrottle by retrofitting it to the popular news aggregator website Reddit and by evaluating the cost of deployment for the scenario of a national newspaper (168k comments per day), an international newspaper (268k c/d) and Reddit itself (4.9M c/d)

    Protecting Children Online?

    Get PDF
    This book investigates regulatory and social pressures that social media companies face in the aftermath of high profile cyberbullying incidents. The author’s research evaluates the policies companies develop to protect themselves and users. This includes interviews with NGO and social media company reps in the US and the EU. She triangulates these findings against news, policy reports, evaluations and interviews with e-safety experts. This book raises questions about the legitimacy of expecting companies to balance the tension between free speech and child protection without publicly revealing their decision-making processes. In an environment where e-safety is part of the corporate business model, this book unveils the process through which established social media companies receive less government scrutiny than start-ups. The importance of this research for law and policy argues for an OA edition to ensure the work is widely and globally accessible to scholars and decision makers

    Computational propaganda : exploring mitigation strategies for political parties in online brand contexts

    Get PDF
    Abstract : This research delves into the phenomenon of computational propaganda on social media, and draws on social media specialists from some of South Africa’s best performing brands to explore potential strategies political parties can employ to mitigate against crises that occur as a result of computational propaganda. This research is of importance given that South Africa is entering its first ever National Elections since the identification of computational propaganda as a threat to electoral processes. To date, there is no research that explores this within the South African context. The research entailed semi-structured interviews with eight social media managers, selected using the purposive non-probability sampling method. In addition to this, the research interviewed a communications head from South Africa’s largest political party in order to assess what strategies are already in place. These two sets of data were consolidated resulting in four potential strategies to mitigate against the risk of computational propaganda. The four potential mitigation strategies are grouped into two approaches, the first approach relates to preventative measures political parties can take, namely protecting brand identity and aligning communications. The second approach related to defensive measures political party brands could take in the event of a computational propaganda event, namely online reputation management and integration of communication. The research further uncovered contextual considerations political party brands must take into account before employing strategies to mitigate against crises that arise as a result of computational propaganda.M.A. (Communication Studies
    corecore