82 research outputs found
A Systematic Literature Review on Cyberbullying in Social Media: Taxonomy, Detection Approaches, Datasets, And Future Research Directions
In the area of Natural Language Processing, sentiment analysis, also called opinion mining, aims to extract human thoughts, beliefs, and perceptions from unstructured texts. In the light of social media's rapid growth and the influx of individual comments, reviews and feedback, it has evolved as an attractive, challenging research area. It is one of the most common problems in social media to find toxic textual content. Anonymity and concealment of identity are common on the Internet for people coming from a wide range of diversity of cultures and beliefs. Having freedom of speech, anonymity, and inadequate social media regulations make cyber toxic environment and cyberbullying significant issues, which require a system of automatic detection and prevention. As far as this is concerned, diverse research is taking place based on different approaches and languages, but a comprehensive analysis to examine them from all angles is lacking. This systematic literature review is therefore conducted with the aim of surveying the research and studies done to date on classification of cyberbullying based in textual modality by the research community. It states the definition, , taxonomy, properties, outcome of cyberbullying, roles in cyberbullying along with other forms of bullying and different offensive behavior in social media. This article also shows the latest popular benchmark datasets on cyberbullying, along with their number of classes (Binary/Multiple), reviewing the state-of-the-art methods to detect cyberbullying and abusive content on social media and discuss the factors that drive offenders to indulge in offensive activity, preventive actions to avoid online toxicity, and various cyber laws in different countries. Finally, we identify and discuss the challenges, solutions, additionally future research directions that serve as a reference to overcome cyberbullying in social media
Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and its Effects on the Web
The discussion-board site 4chan has been part of the Internet's dark underbelly since its inception, and recent political events have put it increasingly in the spotlight. In particular, /pol/, the “Politically Incorrect'” board, has been a central figure in the outlandish 2016 US election season, as it has often been linked to the alt-right movement and its rhetoric of hate and racism. However, 4chan remains relatively unstudied by the scientific community: little is known about its user base, the content it generates, and how it affects other parts of the Web. In this paper, we start addressing this gap by analyzing /pol/ along several axes, using a dataset of over 8M posts we collected over two and a half months. First, we perform a general characterization, showing that /pol/ users are well distributed around the world and that 4chan's unique features encourage fresh discussions. We also analyze content, finding, for instance, that YouTube links and hate speech are predominant on /pol/. Overall, our analysis not only provides the first measurement study of /pol/, but also insight into online harassment and hate speech trends in social media
Online discussions through the lens of interaction patterns
Computer-mediated communication is arguably prevailing over face-to-face. However, many of the subtleties that make in-person communication personal, cues such as an ironic tone of voice or an effortless posture, are inherently impossible to render through a screen. The context vanishes from the conversation - what is left is therefore mostly text, enlivened by occasional multimedia. At least, this seems the dominant opinion of both industry and academia, that recently focused considerable resources on a deeper understanding of natural and visual language.
I argue instead that richer cues are missing from online interaction only because current applications do not acknowledge them -- indeed, communication online is already infused with nonverbal codes, and the effort needed to leverage them is well worth the amount of information they carry. This dissertation therefore focuses on what is left out of the traditional definition of content: I refer to these aspects of communication as content-agnostic. Specifically, this dissertation makes three contributions.
First, I formalize what constitutes content-agnostic information in computer-mediated communication, and prove content-agnostic information is as personal to each user as its offline counterpart. For this reason, I choose as a venue of research the web forum, a supposedly text-based, impersonal communication environment, and show that it is possible to attribute a message to the corresponding author solely on the basis of its content-agnostic features -- in other words, without looking at the content of the message at all.
Next, I display how abundant and how varied is the content-agnostic information that lies untapped in current applications.To this end, I analyze the content-agnostic aspects of one type of interaction, the quote, and draw conclusions on how these may support discussion, signal user status, mark relationships between users, and characterize the discussion forum as a community. One interesting implication is that discussion platforms may not need to introduce new features for supporting social signals, and conversely social networks may better integrate discussion by enhancing its content-agnostic qualities.
Finally, I demonstrate how content-agnostic information reveals user behavior. I focus specifically on trolls, malicious users that disrupt communities through deceptive or manipulative actions. In fact, the language of trolls blends in with that of civil users in heated discussions, which makes collecting irrefutable evidence of trolling difficult even for human moderators. Nonetheless, I show that a combination of content-agnostic and linguistic features sets apart discussions that will eventually be trolled, and reactions to trolling posts. This provides evidence of how content-agnostic information can offer a point of view on user behavior that is at the same time different from, and complementary to, that offered by the actual content of the contribution.
Popular up and coming platforms, such as Snapchat, Tumblr, or Yik Yak, are increasingly abandoning persistent, threaded, text-based discussion, in favor of ephemeral, loosely structured, mixed-media content. Although the results of this dissertation are mostly drawn from discussion forums, its research frame and methods should apply directly to these other venues, and to a broad range of communication paradigms. Also, this is but a preliminary step towards a fuller understanding of what additional cues can or should complement content to overcome the limitations of computer-mediated communication
Mapping (Dis-)Information Flow about the MH17 Plane Crash
Digital media enables not only fast sharing of information, but also
disinformation. One prominent case of an event leading to circulation of
disinformation on social media is the MH17 plane crash. Studies analysing the
spread of information about this event on Twitter have focused on small,
manually annotated datasets, or used proxys for data annotation. In this work,
we examine to what extent text classifiers can be used to label data for
subsequent content analysis, in particular we focus on predicting pro-Russian
and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though
we find that a neural classifier improves over a hashtag based baseline,
labeling pro-Russian and pro-Ukrainian content with high precision remains a
challenging problem. We provide an error analysis underlining the difficulty of
the task and identify factors that might help improve classification in future
work. Finally, we show how the classifier can facilitate the annotation task
for human annotators
The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset
The discourse around conspiracy theories is currently thriving amidst the
rampant misinformation prevalent in online environments. Research in this field
has been focused on detecting conspiracy theories on social media, often
relying on limited datasets. In this study, we present a novel methodology for
constructing a Twitter dataset that encompasses accounts engaged in
conspiracy-related activities throughout the year 2022. Our approach centers on
data collection that is independent of specific conspiracy theories and
information operations. Additionally, our dataset includes a control group
comprising randomly selected users who can be fairly compared to the
individuals involved in conspiracy activities. This comprehensive collection
effort yielded a total of 15K accounts and 37M tweets extracted from their
timelines. We conduct a comparative analysis of the two groups across three
dimensions: topics, profiles, and behavioral characteristics. The results
indicate that conspiracy and control users exhibit similarity in terms of their
profile metadata characteristics. However, they diverge significantly in terms
of behavior and activity, particularly regarding the discussed topics, the
terminology used, and their stance on trending subjects. Interestingly, there
is no significant disparity in the presence of bot users between the two
groups, suggesting that conspiracy and automation are orthogonal concepts.
Finally, we develop a classifier to identify conspiracy users using 93
features, some of which are commonly employed in literature for troll
identification. The results demonstrate a high accuracy level (with an average
F1 score of 0.98%), enabling us to uncover the most discriminative features
associated with conspiracy-related accounts
Trollthrottle -- Raising the Cost of Astroturfing
Astroturfing, i.e., the fabrication of public discourse by private or
state-controlled sponsors via the creation of fake online accounts, has become
incredibly widespread in recent years. It gives a disproportionally strong
voice to wealthy and technology-savvy actors, permits targeted attacks on
public forums and could in the long run harm the trust users have in the
internet as a communication platform. Countering these efforts without
deanonymising the participants has not yet proven effective; however, we can
raise the cost of astroturfing. Following the principle `one person, one
voice', we introduce Trollthrottle, a protocol that limits the number of
comments a single person can post on participating websites. Using direct
anonymous attestation and a public ledger, the user is free to choose any
nickname, but the number of comments is aggregated over all posts on all
websites, no matter which nickname was used. We demonstrate the deployability
of Trollthrottle by retrofitting it to the popular news aggregator website
Reddit and by evaluating the cost of deployment for the scenario of a national
newspaper (168k comments per day), an international newspaper (268k c/d) and
Reddit itself (4.9M c/d)
Protecting Children Online?
This book investigates regulatory and social pressures that social media companies face in the aftermath of high profile cyberbullying incidents. The author’s research evaluates the policies companies develop to protect themselves and users. This includes interviews with NGO and social media company reps in the US and the EU. She triangulates these findings against news, policy reports, evaluations and interviews with e-safety experts. This book raises questions about the legitimacy of expecting companies to balance the tension between free speech and child protection without publicly revealing their decision-making processes. In an environment where e-safety is part of the corporate business model, this book unveils the process through which established social media companies receive less government scrutiny than start-ups. The importance of this research for law and policy argues for an OA edition to ensure the work is widely and globally accessible to scholars and decision makers
Computational propaganda : exploring mitigation strategies for political parties in online brand contexts
Abstract : This research delves into the phenomenon of computational propaganda on social media, and draws on social media specialists from some of South Africa’s best performing brands to explore potential strategies political parties can employ to mitigate against crises that occur as a result of computational propaganda. This research is of importance given that South Africa is entering its first ever National Elections since the identification of computational propaganda as a threat to electoral processes. To date, there is no research that explores this within the South African context. The research entailed semi-structured interviews with eight social media managers, selected using the purposive non-probability sampling method. In addition to this, the research interviewed a communications head from South Africa’s largest political party in order to assess what strategies are already in place. These two sets of data were consolidated resulting in four potential strategies to mitigate against the risk of computational propaganda. The four potential mitigation strategies are grouped into two approaches, the first approach relates to preventative measures political parties can take, namely protecting brand identity and aligning communications. The second approach related to defensive measures political party brands could take in the event of a computational propaganda event, namely online reputation management and integration of communication. The research further uncovered contextual considerations political party brands must take into account before employing strategies to mitigate against crises that arise as a result of computational propaganda.M.A. (Communication Studies
- …