526 research outputs found
UT-DB: an experimental study on sentiment analysis in twitter
This paper describes our system for participating SemEval2013 Task2-B (Kozareva et al., 2013): Sentiment Analysis in Twitter. Given a message, our system classifies whether the message is positive, negative or neutral sentiment. It uses a co-occurrence rate model. The training data are constrained to the data provided by the task organizers (No other tweet data are used). We consider 9 types of features and use a subset of them in our submitted system. To see the contribution of each type of features, we do experimental study on features by leaving one type of features out each time. Results suggest that unigrams are the most important features, bigrams and POS tags seem not helpful, and stopwords should be retained to achieve the best results. The overall results of our system are promising regarding the constrained features and data we use
Why Do Cascade Sizes Follow a Power-Law?
We introduce random directed acyclic graph and use it to model the
information diffusion network. Subsequently, we analyze the cascade generation
model (CGM) introduced by Leskovec et al. [19]. Until now only empirical
studies of this model were done. In this paper, we present the first
theoretical proof that the sizes of cascades generated by the CGM follow the
power-law distribution, which is consistent with multiple empirical analysis of
the large social networks. We compared the assumptions of our model with the
Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201
Social Media and Electoral Predictions: A Meta-Analytic Review
Can social media data be used to make reasonably accurate estimates of electoral outcomes? We conducted a meta-analytic review to examine the predictive performance of different features of social media posts and different methods in predicting political elections: (1) content features; and (2) structural features. Across 45 published studies, we find significant variance in the quality of predictions, which on average still lag behind those in traditional survey research. More specifically, our findings that machine learning-based approaches generally outperform lexicon-based analyses, while combining structural and content features yields most accurate predictions
Design, implementation and experiment of a YeSQL Web Crawler
We describe a novel, "focusable", scalable, distributed web crawler based on
GNU/Linux and PostgreSQL that we designed to be easily extendible and which we
have released under a GNU public licence. We also report a first use case
related to an analysis of Twitter's streams about the french 2012 presidential
elections and the URL's it contains
Sentiment Analysis in Twitter for Spanish
The final publication is available at Springer via http://dx.doi.org/ 10.1007/978-3-319-07983-7_27This paper describes a SVM-approach for Sentiment Analysis
(SA) in Twitter for Spanish. This task was part of the TASS2013
workshop, which is a framework for SA that is focused on the Spanish
language. We describe the approach used, and we present an experimental
comparison of the approaches presented by the di erent teams
that took part in the competition. We also describe the improvements
that were added to our system after our participation in the competition.
With these improvements, we obtained an accuracy of 62.88% and
70.25% on the SA test set for 5-level and 3-level tasks respectively. To
our knowledge, these results are the best results published until now for
the SA tasks of the TASS2013 workshop.This work has been funded by the projects, DIANA (MEC TIN2012-38603-C02-01) and TĂmpano (MEC TIN2011-28169-C05-01).Pla SantamarĂa, F.; Hurtado Oliver, LF. (2014). Sentiment Analysis in Twitter for Spanish. En Natural Language Processing and Information Systems. Springer
Lecture Notes in Computer Science Volume 8455 2014. 208-213. https://doi.org/10.1007/978-3-319-07983-7_27208213Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 36–44 (2010)Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology 60(11), 2169–2188 (2009)Liu, B., Hu, M., Cheng, J.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 342–351. ACM, New York (2005)MartĂnez-Cámara, E., MartĂn-Valdivia, M.T., Ureña-LĂłpez, L.A., Montejo-RaĂ©z, A.: Sentiment analysis in twitter. Natural Language Engineering 1(1), 1–28 (2012)O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: Cohen, W.W., Gosling, S. (eds.) Proceedings of the Fourth International Conference on Weblogs and Social Media, ICWSM 2010, Washington, DC, USA, May 23-26, The AAAI Press (2010)PadrĂł, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul (2012)Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of EMNLP, pp. 79–86 (2002)Perez-Rosas, V., Banea, C., Mihalcea, R.: Learning sentiment lexicons in spanish. In: Chair, N.C.C., Choukri, K., Declerck, T., Doǧan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)Pla, F., Hurtado, L.F.: Análisis de sentimientos en twitter. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática (2013)Saralegi, X., San Vicente, I.: Elhuyar at tass 2013. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática (2013)Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: ACL, pp. 417–424 (2002)Villena-Román, J., GarcĂa-Morera, J.: Workshop on sentiment analysis at sepln 2013: An over view. In: Proceedings of the TASS Workshop at SEPLN 2013, IV Congreso Español de Informática (2013)Vinodhini, G., Chandrasekaran, R.: Sentiment analysis and opinion mining: A survey. International Journal 2(6) (2012)Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Opinionfinder: A system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, Association for Computational Linguistics, pp. 34–35 (2005)Wilson, T., Kozareva, Z., Nakov, P., Rosenthal, S., Stoyanov, V., Ritter, A.: Semeval-2013 task 2: Sentiment analysis in twitter. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval, vol. 13 (2013
Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
Opinion mining and demographic attribute inference have many applications in
social science. In this paper, we propose models to infer daily joint
probabilities of multiple latent attributes from Twitter data, such as
political sentiment and demographic attributes. Since it is costly and
time-consuming to annotate data for traditional supervised classification, we
instead propose scalable Learning from Label Proportions (LLP) models for
demographic and opinion inference using U.S. Census, national and state
political polls, and Cook partisan voting index as population level data. In
LLP classification settings, the training data is divided into a set of
unlabeled bags, where only the label distribution in of each bag is known,
removing the requirement of instance-level annotations. Our proposed LLP model,
Weighted Label Regularization (WLR), provides a scalable generalization of
prior work on label regularization to support weights for samples inside bags,
which is applicable in this setting where bags are arranged hierarchically
(e.g., county-level bags are nested inside of state-level bags). We apply our
model to Twitter data collected in the year leading up to the 2016 U.S.
presidential election, producing estimates of the relationships among political
sentiment and demographics over time and place. We find that our approach
closely tracks traditional polling data stratified by demographic category,
resulting in error reductions of 28-44% over baseline approaches. We also
provide descriptive evaluations showing how the model may be used to estimate
interactions among many variables and to identify linguistic temporal
variation, capabilities which are typically not feasible using traditional
polling methods
- …