2,942 research outputs found

    Multilingual Cross-domain Perspectives on Online Hate Speech

    Full text link
    In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page

    A Survey of Social Network - Word Embedding Approach for Hate Speeches Detection

    Get PDF
    Word embedding is a technique to represent sentences in vector space. The representation itself is carried-out to build a model that would suffice in representing a particular task related to the use of the sentence itself, for example, a model of similarity among sentences/words, a model of Twitter user connectivity, and demographics of tweets model. The use of word embedding is a handful to the sentiment analysis research because it helps build a mathematical-friendly model from sentences. The model then will be suitable as feeds for the other computational process.Word embedding is a technique to represent sentences in vector space. The representation itself is carried-out to build a model that would suffice in representing a particular task related to the use of the sentence itself, for example, a model of similarity among sentences/words, a model of Twitter user connectivity, and demographics of tweets model. The use of word embedding is a handful to the sentiment analysis research because it helps build a mathematical-friendly model from sentences. The model then will be suitable as feeds for the other computational process

    Detecting Hate Speech in Social Media

    Full text link
    In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.Comment: Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 467-472. Varna, Bulgari

    Using State-of-the-art Emotion Detection Models in a Crisis Communication Context

    Get PDF
    Times of crisis are usually associated with highly emotional experiences, which often result in emotionally charged communication. This is especially the case on social media. Identifying the emotional climate on social media is imperative in the context of crisis communication, e.g., in view of shaping crisis response strategies. However, the sheer volume of social media data often makes manual oversight impossible. In this paper, we therefore investigate how automatic methods for emotion detection can aid research on crisis communication and social media. Concretely, we investigate two Dutch emotion detection models (a transformer model and a classical machine learning model based on dictionaries) and apply them to Dutch tweets about four different crisis cases. First, we perform a validation study to assess the performance of these models in the domain of crisis-related tweets. Secondly, we propose a framework for monitoring the emotional climate on social media, and assess whether emotion detection models can be used to address the steps in the framework

    Hate Speech Research: Algorithmic and Qualitative Evaluations. A Case Study of Anti-Gypsy Hate on Twitter

    Get PDF
    Hate speech may be the research focus of the interdisciplinary field of hate studies, but it is also a difficult phenomenon to define. Internationally, there are several detection studies on automatically detecting hate speech. They can be grouped according to two approaches: the first includes searching using only machine learning methods, while the second includes studies that combine automatic searching with human classification. The case study on anti-Gypsy hate in Italian on Twitter in the second half of 2020 falls into the second category, and its methods are outlined here. Based on the results (annotation as ‘hate’/‘non-hate’, identification of forms of rhetoric and anti-Gypsyism), the researchers propose classifying online content according to seven indicators called the ‘spectrum of online hate’

    DALC:the Dutch Abusive Language Corpus

    Get PDF
    As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification

    A Review of Hate Speech Detection: Challenges and Innovations

    Get PDF
    Hate speech on social media platforms has severe impacts on individuals, online communities, and society. Platforms are criticized for shirking their responsibilities to effectively moderate hate speech on their platforms. However, Various challenges, including implicit expressions, complicate the task of detecting hate speech. Consequently, developing and tuning algorithms for improving the automated detection of hate speech has emerged as a crucial research topic. This paper aims to contribute to this rapidly emerging field by outlining how the adoption of natural language processing and machine learning technologies has helped hate speech detection, delving into the latest mainstream detection techniques and their performance, and offering a comprehensive review of the literature on hate speech detection online including the notable challenges and respective mitigating efforts. This paper proposes the integration of interdisciplinary perspectives into deep learning models to enhance the generalization of models, providing a new agenda for future research
    • …
    corecore