Multi-channel CNN to classify nepali covid-19 related tweets using hybrid features

Abstract

Because of the current COVID-19 pandemic with its increasing fears among people, it has triggered several health complications such as depression and anxiety. Such complications have not only affected the developed countries but also developing countries such as Nepal. These complications can be understood from peoples' tweets/comments posted online after their proper analysis and sentiment classification. Nevertheless, owing to the limited number of tokens/words in each tweet, it is always crucial to capture multiple information associated with them for their better understanding. In this study, we, first, represent each tweet by combining both syntactic and semantic information, called hybrid features. The syntactic information is generated from the bag of words method, whereas the semantic information is generated from the combination of the fastText-based (ft) and domain-specific (ds) methods. Second, we design a novel multi-channel convolutional neural network (MCNN), which ensembles the multiple CNNs, to capture multi-scale information for better classification. Last, we evaluate the efficacy of both the proposed feature extraction method and the MCNN model classifying tweets into three sentiment classes (positive, neutral and negative) on NepCOV19Tweets dataset, which is the only public COVID-19 tweets dataset in Nepali language. The evaluation results show that the proposed hybrid features outperform individual feature extraction methods with the highest classification accuracy of 69.7% and the MCNN model outperforms the existing methods with the highest classification accuracy of 71.3% during classification.Comment: This paper is under consideration in Journal of Ambient Intelligence and Humanized Computing (Springer) journal. This version may be deleted or updated at any time depending on the journal's policy upon acceptanc

    Similar works

    Full text

    thumbnail-image

    Available Versions