2 research outputs found
Synonym based feature expansion for Indonesian hate speech detection
Online hate speech is one of the negative impacts of internet-based social media development. Hate speech occurs due to a lack of public understanding of criticism and hate speech. The Indonesian government has regulations regarding hate speech, and most of the existing research about hate speech only focuses on feature extraction and classification methods. Therefore, this paper proposes methods to identify hate speech before a crime occurs. This paper presents an approach to detect hate speech by expanding synonyms in word embedding and shows the classification comparison result between Word2Vec and FastText with bidirectional long short-term memory which are processed using synonym expanding process and without it. The goal is to classify hate speech and non-hate speech. The best accuracy result without the synonym expanding process is 0.90, and the expanding synonym process is 0.93
#REVAL: a semantic evaluation framework for hashtag recommendation
Automatic evaluation of hashtag recommendation models is a fundamental task
in many online social network systems. In the traditional evaluation method,
the recommended hashtags from an algorithm are firstly compared with the ground
truth hashtags for exact correspondences. The number of exact matches is then
used to calculate the hit rate, hit ratio, precision, recall, or F1-score. This
way of evaluating hashtag similarities is inadequate as it ignores the semantic
correlation between the recommended and ground truth hashtags. To tackle this
problem, we propose a novel semantic evaluation framework for hashtag
recommendation, called #REval. This framework includes an internal module
referred to as BERTag, which automatically learns the hashtag embeddings. We
investigate on how the #REval framework performs under different word embedding
methods and different numbers of synonyms and hashtags in the recommendation
using our proposed #REval-hit-ratio measure. Our experiments of the proposed
framework on three large datasets show that #REval gave more meaningful hashtag
synonyms for hashtag recommendation evaluation. Our analysis also highlights
the sensitivity of the framework to the word embedding technique, with #REval
based on BERTag more superior over #REval based on FastText and Word2Vec.Comment: 18 pages, 4 figure