526 research outputs found

    Assessing the reTweet proneness of tweets: predictive models for retweeting

    Get PDF

    UT-DB: an experimental study on sentiment analysis in twitter

    Get PDF
    This paper describes our system for participating SemEval2013 Task2-B (Kozareva et al., 2013): Sentiment Analysis in Twitter. Given a message, our system classifies whether the message is positive, negative or neutral sentiment. It uses a co-occurrence rate model. The training data are constrained to the data provided by the task organizers (No other tweet data are used). We consider 9 types of features and use a subset of them in our submitted system. To see the contribution of each type of features, we do experimental study on features by leaving one type of features out each time. Results suggest that unigrams are the most important features, bigrams and POS tags seem not helpful, and stopwords should be retained to achieve the best results. The overall results of our system are promising regarding the constrained features and data we use

    Why Do Cascade Sizes Follow a Power-Law?

    Full text link
    We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201

    Social Media and Electoral Predictions: A Meta-Analytic Review

    Get PDF
    Can social media data be used to make reasonably accurate estimates of electoral outcomes? We conducted a meta-analytic review to examine the predictive performance of different features of social media posts and different methods in predicting political elections: (1) content features; and (2) structural features. Across 45 published studies, we find significant variance in the quality of predictions, which on average still lag behind those in traditional survey research. More specifically, our findings that machine learning-based approaches generally outperform lexicon-based analyses, while combining structural and content features yields most accurate predictions

    Design, implementation and experiment of a YeSQL Web Crawler

    Full text link
    We describe a novel, "focusable", scalable, distributed web crawler based on GNU/Linux and PostgreSQL that we designed to be easily extendible and which we have released under a GNU public licence. We also report a first use case related to an analysis of Twitter's streams about the french 2012 presidential elections and the URL's it contains

    Sentiment Analysis in Twitter for Spanish

    Full text link
    The final publication is available at Springer via http://dx.doi.org/ 10.1007/978-3-319-07983-7_27This paper describes a SVM-approach for Sentiment Analysis (SA) in Twitter for Spanish. This task was part of the TASS2013 workshop, which is a framework for SA that is focused on the Spanish language. We describe the approach used, and we present an experimental comparison of the approaches presented by the di erent teams that took part in the competition. We also describe the improvements that were added to our system after our participation in the competition. With these improvements, we obtained an accuracy of 62.88% and 70.25% on the SA test set for 5-level and 3-level tasks respectively. To our knowledge, these results are the best results published until now for the SA tasks of the TASS2013 workshop.This work has been funded by the projects, DIANA (MEC TIN2012-38603-C02-01) and Tímpano (MEC TIN2011-28169-C05-01).Pla Santamaría, F.; Hurtado Oliver, LF. (2014). Sentiment Analysis in Twitter for Spanish. En Natural Language Processing and Information Systems. Springer Lecture Notes in Computer Science Volume 8455 2014. 208-213. https://doi.org/10.1007/978-3-319-07983-7_27208213Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 36–44 (2010)Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology 60(11), 2169–2188 (2009)Liu, B., Hu, M., Cheng, J.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 342–351. ACM, New York (2005)Martínez-Cámara, E., Martín-Valdivia, M.T., Ureña-López, L.A., Montejo-Raéz, A.: Sentiment analysis in twitter. Natural Language Engineering 1(1), 1–28 (2012)O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: Cohen, W.W., Gosling, S. (eds.) Proceedings of the Fourth International Conference on Weblogs and Social Media, ICWSM 2010, Washington, DC, USA, May 23-26, The AAAI Press (2010)Padró, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul (2012)Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of EMNLP, pp. 79–86 (2002)Perez-Rosas, V., Banea, C., Mihalcea, R.: Learning sentiment lexicons in spanish. In: Chair, N.C.C., Choukri, K., Declerck, T., Doǧan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)Pla, F., Hurtado, L.F.: Análisis de sentimientos en twitter. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática (2013)Saralegi, X., San Vicente, I.: Elhuyar at tass 2013. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática (2013)Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: ACL, pp. 417–424 (2002)Villena-Román, J., García-Morera, J.: Workshop on sentiment analysis at sepln 2013: An over view. In: Proceedings of the TASS Workshop at SEPLN 2013, IV Congreso Español de Informática (2013)Vinodhini, G., Chandrasekaran, R.: Sentiment analysis and opinion mining: A survey. International Journal 2(6) (2012)Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Opinionfinder: A system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, Association for Computational Linguistics, pp. 34–35 (2005)Wilson, T., Kozareva, Z., Nakov, P., Rosenthal, S., Stoyanov, V., Ritter, A.: Semeval-2013 task 2: Sentiment analysis in twitter. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval, vol. 13 (2013

    Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions

    Full text link
    Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution in of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods
    • …