Characterizing and Mitigating Threats to Trust and Safety Online

Abstract

146 pagesIn the past decade, social media platforms became increasingly important in everyone's lives. However, the services that they provide are constantly abused by some of their users to create real human harm. Such abusive activities include online harassment, spreading mis/disinformation, producing hate speech, and many others. These harmful user behaviors undermine public trust and may discourage users from engaging with the platforms, with consequences that impact the online information ecosystem and our society as a whole. Therefore, it is critical to understand abuse and design solutions to mitigate these challenges to support trust and safety online. In this dissertation, I discuss my work on characterizing and mitigating abusive behaviors online. To understand such behaviors at the scale of modern-day social media, we need scalable and robust detection methods. However, such methods often fail to address the subtlety of abuse. Taking online harassment as an example, adversaries may use target-specific attacks that are difficult to be spotted by automatic detection algorithms, as these algorithms are trained on a general harassment corpus where such attacks don't exist. We address this issue by using contextually-aware analysis, using the adversarial interactions with U.S. political candidates on Twitter in 2018 as a case study. Further, by combining qualitative and quantitative methods, we analyze the users who engage in the adversarial interactions, showing that some tend to seek out conflicts. While abuse mitigation in public platforms receives more and more attention from both the research community and industry practitioners, the same mitigation strategies are not applicable in private settings. For example, one common practice by public platforms is to scan user communications for known policy-violating content, in order to react to such violations in a timely manner. The direct application of such practice in private settings is forbidden, as it violates user privacy. However, abuse in private communications should not be left unmitigated. To this end, we propose mitigation solutions that enable privacy-preserving client-side detection of content that is similar to known bad content. The proposed protocol reveals the detection result to the client, without notifying the server. The idea is to improve users' agency when facing abuse such as mis/disinformation campaigns, to obtain more context about the content that they receive without sacrificing privacy, and to make informed decisions on their own. To realize this protocol, we present and formalize the concept of similarity-based bucketization, allowing efficient computation on large datasets of known misinformation images

    Similar works

    Full text

    thumbnail-image

    Available Versions