2,172 research outputs found

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

    Fostering Sustainability through Visualization Techniques for Real-Time IoT Data: A Case Study Based on Gas Turbines for Electricity Production

    Get PDF
    Improving sustainability is a key concern for industrial development. Industry has recently been benefiting from the rise of IoT technologies, leading to improvements in the monitoring and breakdown prevention of industrial equipment. In order to properly achieve this monitoring and prevention, visualization techniques are of paramount importance. However, the visualization of real-time IoT sensor data has always been challenging, especially when such data are originated by sensors of different natures. In order to tackle this issue, we propose a methodology that aims to help users to visually locate and understand the failures that could arise in a production process.This methodology collects, in a guided manner, user goals and the requirements of the production process, analyzes the incoming data from IoT sensors and automatically derives the most suitable visualization type for each context. This approach will help users to identify if the production process is running as well as expected; thus, it will enable them to make the most sustainable decision in each situation. Finally, in order to assess the suitability of our proposal, a case study based on gas turbines for electricity generation is presented.This work has been co-funded by the ECLIPSE-UA (RTI2018-094283-B-C32) project funded by Spanish Ministry of Science, Innovation, and Universities and the DQIoT (INNO-20171060) project funded by the Spanish Center for Industrial Technological Development, approved with an EUREKA quality seal (E!11737DQIOT). Ana Lavalle holds an Industrial PhD Grant (I-PI 03-18) co-funded by the University of Alicante and the Lucentia Lab Spin-off Company
    corecore