2 research outputs found

    Enhancing Hate Speech Detection in Sinhala Language on Social Media using Machine Learning

    Get PDF
    To counter the harmful dissemination of hate speech on social media, especially abusive outbursts of racism and sexism, automatic and accurate detection is crucial. However, a significant challenge lies in the vast sparsity of available data, hindering accurate classification. This study presents a novel approach to Sinhala hate speech detection on social platforms by coupling a global feature selection process with traditional machine learning, the research scrutinizes hate speech intricacies. A class-based variable feature selection process evaluates significance via global and local scores, identifying optimal values for prevalent classifiers. Utilizing class-based and corpus-based evaluations, we pinpoint optimal feature values for classifiers like SVM, MNB, and RF. Our results reveal notable enhancements in performance, specifically the F1-Score, underscoring how feature selection and parameter tuning work in tandem to boost model efficacy. Furthermore, the study explores nuanced variations in classifier performance across training and testing datasets, emphasizing the importance of model generalization

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Full text link
    Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field
    corecore