Search CORE

583 research outputs found

Revisiting Contextual Toxicity Detection in Conversations

Author: Anuchitanukul Atijit
Ive Julia
Specia Lucia
Publication venue
Publication date: 23/04/2022
Field of study

Understanding toxicity in user conversations is undoubtedly an important problem. Addressing "covert" or implicit cases of toxicity is particularly hard and requires context. Very few previous studies have analysed the influence of conversational context in human perception or in automated detection models. We dive deeper into both these directions. We start by analysing existing contextual datasets and come to the conclusion that toxicity labelling by humans is in general influenced by the conversational structure, polarity and topic of the context. We then propose to bring these findings into computational detection models by introducing and evaluating (a) neural architectures for contextual toxicity detection that are aware of the conversational structure, and (b) data augmentation strategies that can help model contextual toxicity detection. Our results have shown the encouraging potential of neural architectures that are aware of the conversation structure. We have also demonstrated that such models can benefit from synthetic data, especially in the social media domain

arXiv.org e-Print Archive

SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice

Author: Kumarswamy Nihal
Ling Chen
Nilizadeh Shirin
Paudel Pujan
Singhal Mohit
Stringhini Gianluca
Thota Poojitha
Publication venue
Publication date: 27/10/2022
Field of study

To counter online abuse and misinformation, social media platforms have been establishing content moderation guidelines and employing various moderation policies. The goal of this paper is to study these community guidelines and moderation practices, as well as the relevant research publications to identify the research gaps, differences in moderation techniques, and challenges that should be tackled by the social media platforms and the research community at large. In this regard, we study and analyze in the US jurisdiction the fourteen most popular social media content moderation guidelines and practices, and consolidate them. We then introduce three taxonomies drawn from this analysis as well as covering over one hundred interdisciplinary research papers about moderation strategies. We identified the differences between the content moderation employed in mainstream social media platforms compared to fringe platforms. We also highlight the implications of Section 230, the need for transparency and opacity in content moderation, why platforms should shift from a one-size-fits-all model to a more inclusive model, and lastly, we highlight why there is a need for a collaborative human-AI system

arXiv.org e-Print Archive

Pathways to Online Hate: Behavioural, Technical, Economic, Legal, Political & Ethical Analysis.

Author: Aiken M.
Aiken M.
Donaldson S.
Donaldson S.
Tinnelly C.
Tinnelly C.
Publication venue: Perspective Economics for the Alfred Landecker Foundation
Publication date: 01/01/2021
Field of study

The Alfred Landecker Foundation seeks to create a safer digital space for all. The work of the Foundation helps to develop research, convene stakeholders to share valuable insights, and support entities that combat online harms, specifically online hate, extremism, and disinformation. Overall, the Foundation seeks to reduce hate and harm tangibly and measurably in the digital space by using its resources in the most impactful way. It also aims to assist in building an ecosystem that can prevent, minimise, and mitigate online harms while at the same time preserving open societies and healthy democracies. A non-exhaustive literature review was undertaken to explore the main facets of harm and hate speech in the evolving online landscape and to analyse behavioural, technical, economic, legal, political and ethical drivers; key findings are detailed in this report

UEL Research Repository at University of East London

SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Author: Ba Zhongjie
Cheng Peng
Lei Jiachen
Qin Zhan
Ren Kui
Wang Qinglong
Wang Zhibo
Zhong Jieming
Publication venue
Publication date: 25/09/2023
Field of study

Advanced text-to-image models such as DALL-E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work (NSFW) content, we successfully devise and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant photorealistic NSFW images. We reveal the fundamental principles of such prompt attacks and suggest strategically substituting high-risk sections within a suspect prompt to evade closed-source safety measures. Our novel framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules to automate attack prompt creation at scale. Evaluation results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts, leading to the generation of counterfeit images depicting political figures in violent scenarios. Both subjective and objective assessments validate that the images generated from our attack prompts present considerable safety hazards.Comment: 14 pages, 11 figure

arXiv.org e-Print Archive

Extreme Digital Speech:Contexts, Responses, and Solutions

Author
Publication venue: VOX-Pol Network of Excellence
Publication date: 07/01/2020
Field of study

Proceedings - University of Groningen

Extreme Digital Speech:Contexts, Responses, and Solutions

Author
Publication venue: VOX-Pol Network of Excellence
Publication date: 07/01/2020
Field of study

Dissertations of the University of Groningen