31 research outputs found

    TikTokin suositusjärjestelmän tutkinta haamuhenkilöillä

    Get PDF
    The content we see is increasingly determined by ever more advanced recommender systems, and popular social media platform TikTok represents the forefront of this development (See Chapter 1). There has been much speculation about the workings of these recommender systems, but precious little systematic, controlled study (See Chapter 2). To improve our understanding of these systems, I developed sock puppet bots that consume content on TikTok as a normal user would (See Chapter 3). This allowed me to run controlled experiments to see how the TikTok recommender system would respond to sock puppets exhibiting different behaviors and preferences in a Finnish context, and how this would differ from the results obtained by earlier investigations (See Chapter 4). This research was done as part of a journalistic investigation in collaboration with Long Play. I found that TikTok appears to have adjusted their recommender system to personalize content seen by users to a much lesser degree, likely in response to a previous investigation by the WSJ. However, I came to the conclusion that, while sock puppet audits can be useful, they are not a sufficiently scalable solution to algorithm governance, and other types of audits with more internal access are needed (See Chapter 5)

    Characterization and Detection of Malicious Behavior on the Web

    Get PDF
    Web platforms enable unprecedented speed and ease in transmission of knowledge, and allow users to communicate and shape opinions. However, the safety, usability and reliability of these platforms is compromised by the prevalence of online malicious behavior -- for example 40% of users have experienced online harassment. This is present in the form of malicious users, such as trolls, sockpuppets and vandals, and misinformation, such as hoaxes and fraudulent reviews. This thesis presents research spanning two aspects of malicious behavior: characterization of their behavioral properties, and development of algorithms and models for detecting them. We characterize the behavior of malicious users and misinformation in terms of their activity, temporal frequency of actions, network connections to other entities, linguistic properties of how they write, and community feedback received from others. We find several striking characteristics of malicious behavior that are very distinct from those of benign behavior. For instance, we find that vandals and fraudulent reviewers are faster in their actions compared to benign editors and reviewers, respectively. Hoax articles are long pieces of plain text that are less coherent and created by more recent editors, compared to non-hoax articles. We find that sockpuppets are created that vary in their deceptiveness (i.e., whether they pretend to be different users) and their supportiveness (i.e., if they support arguments of other sockpuppets controlled by the same user). We create a suite of feature based and graph based algorithms to efficiently detect malicious from benign behavior. We first create the first vandal early warning system that accurately predicts vandals using very few edits. Next, based on the properties of Wikipedia articles, we develop a supervised machine learning classifier to predict whether an article is a hoax, and another that predicts whether a pair of accounts belongs to the same user, both with very high accuracy. We develop a graph-based decluttering algorithm that iteratively removes suspicious edges that malicious users use to masquerade as benign users, which outperforms existing graph algorithms to detect trolls. And finally, we develop an efficient graph-based algorithm to assess the fairness of all reviewers, reliability of all ratings, and goodness of all products, simultaneously, in a rating network, and incorporate penalties for suspicious behavior. Overall, in this thesis, we develop a suite of five models and algorithms to accurately identify and predict several distinct types of malicious behavior -- namely, vandals, hoaxes, sockpuppets, trolls and fraudulent reviewers -- in multiple web platforms. The analysis leading to the algorithms develops an interpretable understanding of malicious behavior on the web

    What the future holds for social media data analysis

    Get PDF
    The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques

    Man vs machine – Detecting deception in online reviews

    Get PDF
    This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based on individual and aggregated review data, and formulating a review interpretation framework for identifying deception. The theoretical framework is based on two critical deception-related models, information manipulation theory and self-presentation theory. The findings confirm the interchangeable characteristics of the various automated text analysis methods in drawing insights about review characteristics and underline their significant complementary aspects. An integrative multi-method model that approaches the data at the individual and aggregate level provides more complex insights regarding the quantity and quality of review information, sentiment, cues about its relevance and contextual information, perceptual aspects, and cognitive material

    Factitious disorder and its online variant Munchausen by Internet: understanding motivation and its impact on online users to develop a detection method

    Get PDF
    The overarching aim of the research in this thesis was to develop a method of detecting Munchausen by Internet (MbI) and garner an understanding of the dynamics of online communities faced with MbI. Ground work studies were required to learn more about the disorder, to decide exactly what method of detection would be most appropriate. This involved a review of the existing literature available on MbI (paper 1; Munchausen by Internet). It also involved conducting two studies which focused on experiences from the perspective of those with Factitious Disorder (FD) (paper 2; When the lie is the truth: Grounded theory analysis of an online support group for factitious disorder) and MbI from the perspective of victims (paper 3; Claiming someone else’s pain: A Grounded theory analysis of online community user's experiences of Munchausen by Internet). Both these studies were necessary as FD and its online variant MbI are some of the most poorly understood and under researched pathologies. This is primarily because of the difficulty in obtaining and retaining participants who have experience of the disorder. Therefore, what was previously known about the disorder was largely speculative. The research presented in this thesis overcame the issue of recruitment and retentions of participants, by analysing the first-hand accounts written online by those who have experience of the disorders. The information obtained from the two groundwork studies was used in the third study to decide on and develop an appropriate method of detecting MbI and for interpreting the discriminate attributes (paper 4; Detecting Munchausen by Internet: Development of a Text Classifier through Machine Learning). Beyond applying the findings of these studies to the development of the classifier, they also made new theoretical contributions to the existing literature on FD and MbI. The first two studies provide the very first large-scale studies of FD and MbI, using first-hand accounts from those it directly affects rather than observations that are speculative. Grounded Theory was used to analyse the text as it does not require an a priori theoretical framework but allows the data to build the theoretical framework itself, resulting in more innovative findings. The findings offer a new perspective of FD, one which contrasts with traditional theories and indicates that FD may be closely aligned with addiction. The second study examined the dynamics within an online community faced with MbI. The primary findings were that MbI users were targeting ‘ideal victim’ persona which offered protection from suspicion and increased the level of attention and sympathy they could receive. The presence or possible presence of MbI also resulted in members of online communities using strategies to avoid false accusations or being duped. These strategies had the unfortunate consequence of potentially eroding the therapeutic benefits of online communities, in particular personal empowerment, by restricting opportunities to confer normality and cultivate interpersonal support. In addition, the methods used by online community members and their moderators to detect MbI were uncovered. It typically involved high-level deception cues which raised suspicions and the checking authoritative references to confirm or refute these suspicions. The findings from study one and two, as well as the literature review from paper one, offered no overt cues which could be consistently attributed to MbI and offered no support for the feasibility of psychometric testing to detect MBI. Therefore, it was decided that covert deception required a covert method of detection. To this end the SLP (Social Language Processing) framework, which integrates psychology and computer science, was applied to develop a text classifier through machine learning algorithms. This covert method has already been successfully used to detect written deception online. Two text classifiers were developed in study three using Linguistic Inquiry Word Count (LIWC2105) dimensions and n-grams obtained from a bag-of words model, with respective prediction accuracies of 81.11% and 81.67%. These classifiers added a practical application value to the research conducted in this thesis, by producing a method of detecting MbI that can be used by moderators and as a vetting and investigative tool for internet mediated researchers. There were also theoretical contributions obtained from study three. Some of the discriminate attributes used by the classifiers appeared to be unique to Munchausen’s and were associated with the motivation for the behaviour, which supports the growing move towards domain specificity when interpreting Linguistic Based Cues (LBC) of deception. The remaining LBC’s of deception concurred with established deception theory, particularly reduction of cognitive complexity. Overall the research described in this thesis has made new contributions to the existing theories surrounding Factitious Disorder (FD), MbI and Linguistic Based Cues (LBC’s) of deception. It also has a practical application value by creating a classifier which differentiates between text written by genuine people and those exhibiting Munchausen’s

    Sticks and Stones May Break My Bones but Words Will Never Hurt Me...Until I See Them: A Qualitative Content Analysis of Trolls in Relation to the Gricean Maxims and (IM)Polite Virtual Speech Acts

    Get PDF
    The troll is one of the most obtrusive and disruptive bad actors on the internet. Unlike other bad actors, the troll interacts on a more personal and intimate level with other internet users. Social media platforms, online communities, comment boards, and chatroom forums provide them with this opportunity. What distinguishes these social provocateurs from other bad actors are their virtual speech acts and online behaviors. These acts aim to incite anger, shame, or frustration in others through the weaponization of words, phrases, and other rhetoric. Online trolls come in all forms and use various speech tactics to insult and demean their target audiences. The goal of this research is to investigate trolls\u27 virtual speech acts and the impact of troll-like behaviors on online communities. Using Gricean maxims and politeness theory, this study seeks to identify common vernacular, word usage, and other language behaviors that trolls use to divert the conversation, insult others, and possibly affect fellow internet users’ mental health and well-being

    Advanced analytical methods for fraud detection: a systematic literature review

    Get PDF
    The developments of the digital era demand new ways of producing goods and rendering services. This fast-paced evolution in the companies implies a new approach from the auditors, who must keep up with the constant transformation. With the dynamic dimensions of data, it is important to seize the opportunity to add value to the companies. The need to apply more robust methods to detect fraud is evident. In this thesis the use of advanced analytical methods for fraud detection will be investigated, through the analysis of the existent literature on this topic. Both a systematic review of the literature and a bibliometric approach will be applied to the most appropriate database to measure the scientific production and current trends. This study intends to contribute to the academic research that have been conducted, in order to centralize the existing information on this topic

    Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement

    Full text link
    When groups of people are tasked with making a judgment, the issue of uncertainty often arises. Existing methods to reduce uncertainty typically focus on iteratively improving specificity in the overall task instruction. However, uncertainty can arise from multiple sources, such as ambiguity of the item being judged due to limited context, or disagreements among the participants due to different perspectives and an under-specified task. A one-size-fits-all intervention may be ineffective if it is not targeted to the right source of uncertainty. In this paper we introduce a new workflow, Judgment Sieve, to reduce uncertainty in tasks involving group judgment in a targeted manner. By utilizing measurements that separate different sources of uncertainty during an initial round of judgment elicitation, we can then select a targeted intervention adding context or deliberation to most effectively reduce uncertainty on each item being judged. We test our approach on two tasks: rating word pair similarity and toxicity of online comments, showing that targeted interventions reduced uncertainty for the most uncertain cases. In the top 10% of cases, we saw an ambiguity reduction of 21.4% and 25.7%, and a disagreement reduction of 22.2% and 11.2% for the two tasks respectively. We also found through a simulation that our targeted approach reduced the average uncertainty scores for both sources of uncertainty as opposed to uniform approaches where reductions in average uncertainty from one source came with an increase for the other
    corecore