11 research outputs found

    Tripartite Graph Clustering for Dynamic Sentiment Analysis on Social Media

    Full text link
    The growing popularity of social media (e.g, Twitter) allows users to easily share information with each other and influence others by expressing their own sentiments on various subjects. In this work, we propose an unsupervised \emph{tri-clustering} framework, which analyzes both user-level and tweet-level sentiments through co-clustering of a tripartite graph. A compelling feature of the proposed framework is that the quality of sentiment clustering of tweets, users, and features can be mutually improved by joint clustering. We further investigate the evolution of user-level sentiments and latent feature vectors in an online framework and devise an efficient online algorithm to sequentially update the clustering of tweets, users and features with newly arrived data. The online framework not only provides better quality of both dynamic user-level and tweet-level sentiment analysis, but also improves the computational and storage efficiency. We verified the effectiveness and efficiency of the proposed approaches on the November 2012 California ballot Twitter data.Comment: A short version is in Proceeding of the 2014 ACM SIGMOD International Conference on Management of dat

    Analysis of Students Emotion for Twitter Data using Naïve Bayes and Non Linear Support Vector Machine Approachs

    Get PDF
    Students' informal discussions on social media (e.g Twitter, Facebook) shed light into their educational understandings- opinions, feelings, and concerns about the knowledge process. Data from such surroundings can provide valuable knowledge about students learning. Examining such data, however can be challenging. The difficulty of students' experiences reflected from social media content requires human analysis. However, the growing scale of data demands spontaneous data analysis techniques. The posts of engineering students' on twitter is focused to understand issues and problems in their educational experiences. Analysis on samples taken from tweets related to engineering students' college life is conducted. The proposed work is to explore engineering students informal conversations on Twitter in order to understand issues and problems students encounter in their learning experiences. The encounter problems of engineering students from tweets such as heavy study load, lack of social engagement and sleep deprivation are considered as labels. To classify tweets reflecting students' problems multi-label classification algorithms is implemented. Non Linear Support Vector Machine, Naïve Bayes and Linear Support Vector Machine methods are used as multilabel classifiers which are implemented and compared in terms of accuracy. Non Linear SVM has shown more accuracy than Naïve Bayes classifier and linear Support Vector Machine classifier. The algorithms are used to train a detector of student problems from tweets. DOI: 10.17762/ijritcc2321-8169.150515

    Social media content for business and user engagement on Facebook

    Get PDF
    Facebook is regularly used by businesses to present themselves to users and communicate with them. most users act passivily by simply reading and viewing a company's official homepage. Few followers adopt a more active role, such as commenting and interacting with each other and with the company, fewer still are reactive and proactive, becoming co-creators of content. This study examines the type of content entered by businesses to stimulate user engagement, and how participation and activism is stimulated, through the creation of appropriate indexes. The results obtained uncover previously overlooked aspects of conversation and content setting, to encourage user engagement

    Representation of functions on big data associated with directed graphs

    Get PDF
    This paper is an extension of the previous work of Chui et al. (2015) [4], not only from numeric data to include non-numeric data as in that paper, but also from undirected graphs to directed graphs (called digraphs, for simplicity). Besides theoretical development, this paper introduces effective mathematical tools in terms of certain data-dependent orthogonal systems for function representation and analysis directly on the digraphs. In addition, this paper also includes algorithmic development and discussion of various experimental results on such data-sets as CORA, Proposition, and Wiki-votes

    Cost-Aware Partitioning for Efficient Large Graph Processing in Geo-Distributed Datacenters

    Get PDF
    International audienceGraph processing is an emerging computation model for a wide range of applications and graph partitioning is important for optimizing the cost and performance of graph processing jobs. Recently, many graph applications store their data on geo-distributed datacenters (DCs) to provide services worldwide with low latency. This raises new challenges to existing graph partitioning methods, due to the multi-level heterogeneities in network bandwidth and communication prices in geo-distributed DCs. In this paper, we propose an efficient graph partitioning method named Geo-Cut, which takes both the cost and performance objectives into consideration for large graph processing in geo-distributed DCs.Geo-Cut adopts two optimization stages. First, we propose a cost-aware streaming heuristic and utilize the one-pass streaming graph partitioning method to quickly assign edges to different DCs while minimizing inter-DC data communication cost. Second, we propose two partition refinement heuristics which identify the performance bottlenecks of geo-distributed graph processing and refine the partitioning result obtained in the first stage to reduce the inter-DC data transfer time while satisfying the budget constraint. Geo-Cut can be also applied to partition dynamic graphs thanks to its lightweight runtime overhead. We evaluate the effectiveness and efficiency of Geo-Cut using real-world graphs with both real geo-distributed DCs and simulations. Evaluation results show that Geo-Cut can reduce the inter-DC data transfer time by up to 79% (42% as the median) and reduce the monetary cost by up to 75% (26% as the median) compared to state-of-the-art graph partitioning methods with a low overhead
    corecore