3,334 research outputs found
Deep Random Forest and AraBert for Hate Speech Detection from Arabic Tweets
Nowadays, hate speech detection from Arabic tweets attracts the attention of many researchers. Numerous systems and techniques have been proposed to address this classification challenge. Nonetheless, three major limits persist: the use of deep learning models with an excess of hyperparameters, the reliance on hand-crafted features, and the requirement for a huge amount of training data to achieve satisfactory performance. In this study, we propose Contextual Deep Random Forest (CDRF), a hate speech detection approach that combines contextual embedding and Deep Random Forest. From the experimental findings, the Arabic contextual embedding model proves to be highly effective in hate speech detection, outperforming the static embedding models. Additionally, we prove that the proposed CDRF significantly enhances the performance of Arabic hate speech classification
Profanity and hate speech detection
Profanity, often found in today’s online social media, has been used to detect online hate speech. The aims of this study were to investigate the profanity usage on Twitter by different groups of users, and to quantify the effectiveness of using profanity in detecting hate speech. Tweets from three English-speaking countries, Australia, Malaysia, and the United States, were collected for data
analysis. Statistical hypothesis tests were performed to justify the difference of profanity usage among the three countries, and a probability estimation procedure was formulated based on Bayes theorem to quantify the effectiveness of profanity-based methods in hate speech detection. Three deep learning methods, long short-term memory (LSTM), bidirectional LSTM (BLSTM), and
bidirectional encoder representations from transformers (BERT) are further used to evaluate the effect of profanity screening on building classification model. Our
experimental results show that the effectiveness of using profanity in detecting hate speech is questionable. Nevertheless, the results also show that for Australia
tweets, where profanity is more associated with hatred, profanity-based methods in hate speech detection could be effective and profanity screening can address the class imbalance issue in hate speech detection. This is evidenced by the performances of using deep learning methods on the profanity screened data of Australia data, which achieved a classification f1-score greater than 0.84
Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines
The detection of hate speech in political discourse is a critical issue, and
this becomes even more challenging in low-resource languages. To address this
issue, we introduce a new dataset named IEHate, which contains 11,457 manually
annotated Hindi tweets related to the Indian Assembly Election Campaign from
November 1, 2021, to March 9, 2022. We performed a detailed analysis of the
dataset, focusing on the prevalence of hate speech in political communication
and the different forms of hateful language used. Additionally, we benchmark
the dataset using a range of machine learning, deep learning, and
transformer-based algorithms. Our experiments reveal that the performance of
these models can be further improved, highlighting the need for more advanced
techniques for hate speech detection in low-resource languages. In particular,
the relatively higher score of human evaluation over algorithms emphasizes the
importance of utilizing both human and automated approaches for effective hate
speech moderation. Our IEHate dataset can serve as a valuable resource for
researchers and practitioners working on developing and evaluating hate speech
detection techniques in low-resource languages. Overall, our work underscores
the importance of addressing the challenges of identifying and mitigating hate
speech in political discourse, particularly in the context of low-resource
languages. The dataset and resources for this work are made available at
https://github.com/Farhan-jafri/Indian-Election.Comment: Accepted to ICWSM Workshop (MEDIATE
Do Linguistic Features Help Deep Learning? The Case of Aggressiveness in Mexican Tweets
[EN] In the last years, the control of online user generated content is becoming a priority, because of the increase of online aggressiveness and hate speech legal cases. Considering the complexity and the importance of this issue, this paper presents an approach that combines the deep learning framework with linguistic features for the recognition of aggressiveness in Mexican tweets. This approach has been evaluated relying on
a collection of tweets released by the organizers of
the shared task about aggressiveness detection in the
context of the Ibereval 2018 evaluation campaign. The
use of a benchmark corpus allows to compare the
results with those obtained by Ibereval 2018 participant
systems. However, looking at the achieved results,
linguistic features seem not to help the deep learning
classification for this task.The work of Simona Frenda and Paolo Rosso was partially funded by the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P).Frenda, S.; Banerjee, S.; Rosso, P.; Patti, V. (2020). Do Linguistic Features Help Deep Learning? The Case of Aggressiveness in Mexican Tweets. Computación y Sistemas. 24(2):633-643. https://doi.org/10.13053/CyS-24-2-3398S63364324
- …