Racialised harm on social media impacts the psychological wellbeing, participation, and online safety of marginalised Communities. Yet automated moderation systems often function as opaque classifiers, frequently misinterpreting expressions of racial trauma, especially justified anger as toxicity. This paper presents an interpretable NLP framework for analysing racial harm in UK social media contexts, using survey data from 809 participants, including 408 narrative accounts of racially motivated abuse. The hybrid pipeline integrates domain-specific lexical sentiment scoring, TF–IDF topic modelling, and zero-shot transformer emotion inference with token-level attribution to ensure transparency. The model achieves balanced performance (F1 = 0.79) while preserving contextual interpretability. Thematic and emotional analyses reveal clusters of anti-Black abuse, Islamophobia, COVID-related anti-Asian hostility, and xenophobic rhetoric, with anger, sadness, and fear emerging as dominant and legitimate harm responses. Qualitative insights indicate low trust in reporting systems and limited platform accountability. Overall, the framework demonstrates that accuracy and interpretability can be jointly achieved, supporting transparent and accountable approaches to online harm analysis
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.