An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media

Alnajran, Noufa; Crockett, Keeley; Latham, Annabel; McLean, David

An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media

Authors: Noufa Alnajran
Keeley Crockett
Annabel Latham
David McLean
Publication date: 20 December 2018
Publisher
Doi

Abstract

Measuring textual semantic similarity has been a subject of intense discussion in NLP and AI for many years. A new area of research has emerged that applies semantic similarity measures within Twitter. However, the development of these measures for the semantic analysis of tweets imposes fundamental challenges. The sparsity, ambiguity, and informality present in social media are hampering the performance of traditional textual similarity measures as “tweets”, have special syntactic and semantic characteristics. This paper reviews and evaluates the performance of topological, statistical, and hybrid similarity measures, in the context of Twitter analysis. Furthermore, the performance of each measure is compared against a naïve keyword-based similarity computation method to assess the significance of semantic computation in capturing the meaning in tweets. An experiment is designed and conducted to evaluate the different measures through examining various metrics, including correlation, error rates, and statistical tests on a benchmark dataset. The potential weaknesses of semantic similarity measures in relation to Twitter applications of textual similarity assessment and the research contributions are discussed. This research highlights challenges and potential improvement areas for the semantic similarity of tweets, a resource for researchers and practitioners

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

E-space: Manchester Metropolitan University's Research Repository

oai:e-space.mmu.ac.uk:621809

Last time updated on 02/01/2019

Crossref

Last time updated on 10/08/2021