Search CORE

15 research outputs found

ALONE: A Dataset for Toxic Behavior among Adolescents on Twitter

Author: A Edwards
A Wandersman
AH Razavi
DM Søndergaard
F Mishna
I Rivers
J Liu
JW Patchin
K Kumpulainen
L Arseneault
M Rafla
M Rezvan
MC Parent
ND Brener
P Nilan
R Lowry
RM Viner
T Jay
U Kursuncu
Publication venue
Publication date: 01/01/2020
Field of study

The convenience of social media has also enabled its misuse, potentially resulting in toxic behavior. Nearly 66% of internet users have observed online harassment, and 41% claim personal experience, with 18% facing severe forms of online harassment. This toxic communication has a significant impact on the well-being of young individuals, affecting mental health and, in some cases, resulting in suicide. These communications exhibit complex linguistic and contextual characteristics, making recognition of such narratives challenging. In this paper, we provide a multimodal dataset of toxic social media interactions between confirmed high school students, called ALONE (AdoLescents ON twittEr), along with descriptive explanation. Each instance of interaction includes tweets, images, emoji and related metadata. Our observations show that individual tweets do not provide sufficient evidence for toxic behavior, and meaningful use of context in interactions can enable highlighting or exonerating tweets with purported toxicity.Comment: Accepted: Social Informatics 202

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

ALONE: A Dataset for Toxic Behavior among Adolescents on Twitter

Author: Arpinar I. Budak
Gaur Manas
Inan Hale
Kursuncu Ugur
Shalin Valerie L.
Sheth Amit P.
Thirunarayan Krishnaprasad
Wijesiriwardene Thilini
Publication venue: Scholar Commons
Publication date: 01/01/2020
Field of study

Scholar Commons - Institutional Repository of the University of South Carolina

Cyberbullying through intellect - related insults

Author: Bahiyah Abdul Hamid
Shahidatul Maslina Mat Sood
Tan Kim Hua
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/01/2020
Field of study

Unrestricted utilisation of digital devices and online platforms promulgates cyberbullying, which has been typically identified with the presence of potentially profane or offensive words that can cause aggravation to others. Previous studies have clarified that certain challenges arise in detecting abusive language in social media, especially on Twitter. The apparent reason for such encounters is typically triggered by the informal language used in various tweets. This study discusses the issues of abusive language that are used in Malaysian’s online communication by highlighting the linguistic features of aggressive insulting words used by social media users in nit-picking an individual’s intelligence. Data collection and analysis are conducted in two stages. Firstly, a self-constructed questionnaire is conducted to elicit imperative keywords or phrases used in assisting subsequent analysis of the content-based approach. Secondly, Twitter data, which have been streamed using the Twitter API and R statistical software, are explored. Thematic analysis is also used in the second phase to analyse the keywords that are subjected to qualitative explanations. Initial results indicate ‘bodoh’ as the most common online insult used to degrade an individual’s intelligence. Twitter users also make use of more abusive words (insults) in Malay than in English for degrading purposes through a variety of intelligence-related insults such as ‘bebal’, ‘sengal’, ‘gila’, ‘bodoh’, ‘bangang’, ‘bengap’, ‘semak’ and ‘bongok’. Likewise, linguistics realisations such as spelling alteration, word repetition, laughing remarks, punctuations, animal imagery, dialect interference, code-mixing, and Malaysian English markers are observed through the features of those highlighted insults

UKM Journal Article Repository

Analyzing and Learning the Language for Different Types of Harassment

Author: Alshargi Faisal
Rezvan Mohammadreza
Shalin Valerie L
Shekarpour Saeedeh
Sheth Amit P.
Thirunarayan Krishnaprasad
Publication venue: Scholar Commons
Publication date: 27/03/2020
Field of study

THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features

Scholar Commons - Institutional Repository of the University of South Carolina

Towards multidomain and multilingual abusive language detection: a survey

Author: Basile V.
Pamungkas E. W.
Patti V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Author: Balasuriya Lakshika
Rezvan Mohammadreza
Shalin Valerie L
Shekarpour Saeedeh
Sheth Amit P.
Publication venue: SelectedWorks
Publication date: 27/05/2020
Field of study

A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2

CORE