Search CORE

3 research outputs found

Clustering Arabic Tweets for Sentiment Analysis

Author: Abuaiadah Diab
Dileep Rajendran
Mustafa Jarrar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2017
Field of study

The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

Wintec Research Archive

Clustering Arabic Tweets for Sentiment Analysis

Author: Abuaiadah Diab
Dileep Rajendran
Mustafa Jarrar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2017
Field of study

Wintec Research Archive

Semi supervised relevance learning for feature selection on high dimensional data

Author: Ben Brahim Afef
Kalousis Alexandros
Publication venue: Hammamet, Tunisia, 30 October - 3 November 2017
Publication date: 04/10/2018
Field of study

Nowadays, the advanced technologies make amounts of data growing in a fast paced way. In many application fields, this trend concerns specially dimensions of the data. It is the case where features are about thousands and tens of thousands, while the number of instances is much smaller. This phenomenon is known as the curse of dimensionality and it results in modest classification performance and feature selection instability. In order to deal with this issue, we propose a new feature selection approach that makes use of background knowledge about some dimensions known to be more relevant, as a means of directing the feature selection process. In this approach, prior knowledge about some features is used to learn new relevant features by a semi supervised approach. Experiments on three high dimensional data sets show promising results on both classification performance and stability of feature selection

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)