1,517 research outputs found

    Toxic Comment Classification based on Personality Traits Using NLP

    Get PDF
    Concerns about the frequency of harmful remarks have been raised by the growth of online communication platforms, which makes it difficult to create inclusive and safe digital spaces. This study explores the creation of a strong framework that uses machine learning algorithms and natural language processing (NLP) methods to categorise harmful comments. In order to improve the accuracy and comprehensiveness of categorization, the study investigates the integration of personality trait analysis in addition to identifying hazardous language. A wide range of online comments comprised the dataset that was gathered and put through extensive preparation methods such as text cleaning, lemmatization, and feature extraction. To facilitate the training and assessment of machine learning models, textual data was converted into numerical representations by utilising TF-IDF vectorization and word embeddings. Furthermore, personality traits were extracted from comments using sentiment analysis and language clues, which linked linguistic patterns with behavioural inclinations. The study resulted in the development and assessment of complex categorization models that combined features from textual content and inferred personality traits. The findings show encouraging associations between specific personality qualities and the use of toxic language, providing opportunities to identify subtle differences in toxic comment contexts. In order to provide insights into developing more sophisticated and successful methods of reducing toxicity in online discourse, this study outlines the methodology, major findings, and consequences of incorporating personality traits analysis into the classification of toxic comments

    Multimodal cyberbullying detection using capsule network with dynamic routing and deep convolutional neural network

    Get PDF
    Cyberbullying is the use of information technology networks by individuals’ to humiliate, tease, embarrass, taunt, defame and disparage a target without any face-to-face contact. Social media is the 'virtual playground' used by bullies with the upsurge of social networking sites such as Facebook, Instagram, YouTube and Twitter. It is critical to implement models and systems for automatic detection and resolution of bullying content available online as the ramifications can lead to a societal epidemic. This paper presents a deep neural model for cyberbullying detection in three different modalities of social data, namely textual, visual and info-graphic (text embedded along with an image). The all-in-one architecture, CapsNet–ConvNet, consists of a capsule network (CapsNet) deep neural network with dynamic routing for predicting the textual bullying content and a convolution neural network (ConvNet) for predicting the visual bullying content. The info-graphic content is discretized by separating text from the image using Google Lens of Google Photos app. The perceptron-based decision-level late fusion strategy for multimodal learning is used to dynamically combine the predictions of discrete modalities and output the final category as bullying or non-bullying type. Experimental evaluation is done on a mix-modal dataset which contains 10,000 comments and posts scrapped from YouTube, Instagram and Twitter. The proposed model achieves a superlative performance with the AUC–ROC of 0.98

    An assessment of deep learning models and word embeddings for toxicity detection within online textual comments

    Get PDF
    Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone’s feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task

    Tackling Sexist Hate Speech: Cross-Lingual Detection and Multilingual Insights from Social Media

    Get PDF
    With the widespread use of social media, the proliferation of online communication presents both opportunities and challenges for fostering a respectful and inclusive digital environment. Due to the anonymity and weak regulations of social media platforms, the rise of hate speech has become a significant concern, particularly against specific individuals or groups based on race, religion, ethnicity, or gender, posing a severe threat to human rights. Sexist hate speech is a prevalent form of online hate that often manifests itself through gender-based violence and discrimination, challenging societal norms and legal systems. Despite the advances in natural language processing techniques for detecting offensive and sexist content, most research still focuses on monolingual (primarily English) contexts, neglecting the multilingual nature of online platforms. This gap highlights the need for effective and scalable strategies to address the linguistic diversity and cultural variations in hate speech. Cross-language transfer learning and state-of-the-art multilingual pre-trained language models provide potential solutions to improve the detection efficiency of low-resource languages by leveraging data from high-resource languages. Additional knowledge is crucial to facilitate the models’ performance in detecting culturally varying expressions of sexist hate speech in different languages. In this thesis, we delve into the complex area of identifying sexist hate speech in social media across diverse languages pertaining to different language families, with a focus on sexism and a broad exploration of datasets, methodologies, and barriers inherent in mitigating online hate speech in cross-lingual and multilingual scenarios. We primarily apply cross-lingual transfer learning techniques to detect sexist hate speech, aiming to leverage knowledge acquired from related linguistic data in order to improve performance in a target language. We also investigate the integration of external knowledge to deepen the understanding of sexism in multilingual social media contexts, addressing both the challenges of linguistic diversity and the need for comprehensive, culturally sensitive hate speech detection models. Specifically, it embarks on a comprehensive survey of tackling cross-lingual hate speech online, summarising existing datasets and cross-lingual approaches, as well as highlighting challenges and frontiers in this field. It then presents a first contribution to the field, the creation of the Sina Weibo Sexism Review (Swsr) dataset in Chinese —a pioneering resource that not only fills a crucial gap in limited resources but also lays the foundation for relevant cross-lingual investigations. Additionally, it examines how cross-lingual techniques can be utilised to generate domain-aware word embeddings, and explores the application of these embeddings in a cross-lingual hate speech framework, thereby enhancing the capacity to capture the subtleties of sexist hate speech across diverse languages. Recognising the significance of linguistic nuances in multilingual and cross-lingual settings, another innovation consists in proposing and evaluating a series of multilingual and cross-lingual models tailored for detecting sexist hate speech. By leveraging the capacity of shared knowledge and features across languages, these models significantly advance the state-of-the-art in identifying online sexist hate speech. As societies continue to deal with the complexities of social media, the findings and methodologies presented in this thesis could effectively help foster more inclusive and respectful online content across languages

    RGCL at IDAT: deep learning models for irony detection in Arabic language

    Get PDF
    This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results
    • …
    corecore