Enhancing Hate Speech Detection in Sinhala Language on Social Media using Machine Learning

Abstract

To counter the harmful dissemination of hate speech on social media, especially abusive outbursts of racism and sexism, automatic and accurate detection is crucial. However, a significant challenge lies in the vast sparsity of available data, hindering accurate classification. This study presents a novel approach to Sinhala hate speech detection on social platforms by coupling a global feature selection process with traditional machine learning, the research scrutinizes hate speech intricacies. A class-based variable feature selection process evaluates significance via global and local scores, identifying optimal values for prevalent classifiers. Utilizing class-based and corpus-based evaluations, we pinpoint optimal feature values for classifiers like SVM, MNB, and RF. Our results reveal notable enhancements in performance, specifically the F1-Score, underscoring how feature selection and parameter tuning work in tandem to boost model efficacy. Furthermore, the study explores nuanced variations in classifier performance across training and testing datasets, emphasizing the importance of model generalization

    Similar works