Search CORE

1,874 research outputs found

Web Replica Hosting Systems

Author: Pierre G.
Sivasubramanian S.
Szymaniak M.
Publication venue
Publication date: 01/01/2004
Field of study

Replication for Web Hosting Systems

Replication is a well-known technique to improve the accessibility of Web sites. It generally offers reduced client latencies and increases a site’s availability. However, applying replication techniques is not trivial, and various Content Delivery Networks (CDNs) have been created to facilitate replication for digital content providers. Th

CiteSeerX

Clustering Arabic Tweets for Sentiment Analysis

Author: Abuaiadah Diab
Dileep Rajendran
Mustafa Jarrar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2017
Field of study

The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

Wintec Research Archive