15 research outputs found

    ALONE: A Dataset for Toxic Behavior among Adolescents on Twitter

    Get PDF
    The convenience of social media has also enabled its misuse, potentially resulting in toxic behavior. Nearly 66% of internet users have observed online harassment, and 41% claim personal experience, with 18% facing severe forms of online harassment. This toxic communication has a significant impact on the well-being of young individuals, affecting mental health and, in some cases, resulting in suicide. These communications exhibit complex linguistic and contextual characteristics, making recognition of such narratives challenging. In this paper, we provide a multimodal dataset of toxic social media interactions between confirmed high school students, called ALONE (AdoLescents ON twittEr), along with descriptive explanation. Each instance of interaction includes tweets, images, emoji and related metadata. Our observations show that individual tweets do not provide sufficient evidence for toxic behavior, and meaningful use of context in interactions can enable highlighting or exonerating tweets with purported toxicity.Comment: Accepted: Social Informatics 202

    ALONE: A Dataset for Toxic Behavior among Adolescents on Twitter

    Get PDF
    The convenience of social media has also enabled its misuse, potentially resulting in toxic behavior. Nearly 66% of internet users have observed online harassment, and 41% claim personal experience, with 18% facing severe forms of online harassment. This toxic communication has a significant impact on the well-being of young individuals, affecting mental health and, in some cases, resulting in suicide. These communications exhibit complex linguistic and contextual characteristics, making recognition of such narratives challenging. In this paper, we provide a multimodal dataset of toxic social media interactions between confirmed high school students, called ALONE (AdoLescents ON twittEr), along with descriptive explanation. Each instance of interaction includes tweets, images, emoji and related metadata. Our observations show that individual tweets do not provide sufficient evidence for toxic behavior, and meaningful use of context in interactions can enable highlighting or exonerating tweets with purported toxicity

    Cyberbullying through intellect - related insults

    Get PDF
    Unrestricted utilisation of digital devices and online platforms promulgates cyberbullying, which has been typically identified with the presence of potentially profane or offensive words that can cause aggravation to others. Previous studies have clarified that certain challenges arise in detecting abusive language in social media, especially on Twitter. The apparent reason for such encounters is typically triggered by the informal language used in various tweets. This study discusses the issues of abusive language that are used in Malaysian’s online communication by highlighting the linguistic features of aggressive insulting words used by social media users in nit-picking an individual’s intelligence. Data collection and analysis are conducted in two stages. Firstly, a self-constructed questionnaire is conducted to elicit imperative keywords or phrases used in assisting subsequent analysis of the content-based approach. Secondly, Twitter data, which have been streamed using the Twitter API and R statistical software, are explored. Thematic analysis is also used in the second phase to analyse the keywords that are subjected to qualitative explanations. Initial results indicate ‘bodoh’ as the most common online insult used to degrade an individual’s intelligence. Twitter users also make use of more abusive words (insults) in Malay than in English for degrading purposes through a variety of intelligence-related insults such as ‘bebal’, ‘sengal’, ‘gila’, ‘bodoh’, ‘bangang’, ‘bengap’, ‘semak’ and ‘bongok’. Likewise, linguistics realisations such as spelling alteration, word repetition, laughing remarks, punctuations, animal imagery, dialect interference, code-mixing, and Malaysian English markers are observed through the features of those highlighted insults

    Analyzing and Learning the Language for Different Types of Harassment

    Get PDF
    THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features

    A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

    No full text
    A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2

    A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

    No full text
    A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2

    A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

    No full text
    A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2
    corecore