1 research outputs found

    Large Scale and Parallel Sentiment Analysis Based on Label Propagation in Twitter Data

    No full text
    Sentiment analysis is a promising branch in natural language processing, but it becomes challenging when dealing with data from Twitter due to the big volume, rapidly changing language style and a lack of training data. As a result, it is difficult to utilize the traditional lexicon-based approach and supervised learning method for the problems mentioned above. In this paper, we propose the label propagation algorithm in order to solve the last two problems based on graph structure and apply GraphX, an API in Spark framework for graph parallel computing, to address the first problem. The results show that the label propagation algorithm is robust and scalable in our parallel implementation. Meanwhile, our approach which utilizes the lexicon and noisy label like emoticons outperform the baseline significantly. For the future works, we plan to test more algorithms in clusters and optimize the way of taking advantage of the social network by adding a community detection procedure before the classification to improve the accuracy
    corecore