    Scalable Privacy-Compliant Virality Prediction on Twitter

    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    How Twitter Has Changed the Way Advertisers Communicate

    Since its inception in 2006 Twitter has become one of the most prevalent social media platforms, with over 330 million active users and over 500 million “tweets” sent daily (Aslam, 2018). This research project will conduct a content analysis of specific tweets from the Wendy’s corporation official Twitter account in addition to qualitatively evaluating scholarly articles on the topic of social media, marketing, and online communication. The key focus is how Twitter creates what is referred to as a “dialogic loop,” a pattern that only develops in online communication. This paper will highlight how Twitter has changed the way advertisers utilize social media to promote their organization’s goals. Wendy’s establishes dialogic loops primarily by using retweets and humor to connect with individuals, by engaging with other organizations positively and negatively, and by utilizing a character spokesperso

    Emergent Influence Networks in Good-Faith Online Discussions

    Town hall-type debates are increasingly moving online, irrevocably transforming public discourse. Yet, we know relatively little about crucial social dynamics that determine which arguments are more likely to be successful. This study investigates the impact of one's position in the discussion network created via responses to others' arguments on one's persuasiveness in unfacilitated online debates. We propose a novel framework for measuring the impact of network position on persuasiveness, using a combination of social network analysis and machine learning. Complementing existing studies investigating the effect of linguistic aspects on persuasiveness, we show that the user's position in a discussion network influences their persuasiveness online. Moreover, the recognition of successful persuasion further increases this dominant network position. Our findings offer important insights into the complex social dynamics of online discourse and provide practical insights for organizations and individuals seeking to understand the interplay between influential positions in a discussion network and persuasive strategies in digital spaces

    About Challenges in Data Analytics and Machine Learning for Social Good

    The large number of new services and applications and, in general, all our everyday activities resolve in data mass production: all these data can become a golden source of information that might be used to improve our lives, wellness and working days. (Interpretable) Machine Learning approaches, the use of which is increasingly ubiquitous in various settings, are definitely one of the most effective tools for retrieving and obtaining essential information from data. However, many challenges arise in order to effectively exploit them. In this paper, we analyze key scenarios in which large amounts of data and machine learning techniques can be used for social good: social network analytics for enhancing cultural heritage dissemination; game analytics to foster Computational Thinking in education; medical analytics to improve the quality of life of the elderly and reduce health care expenses; exploration of work datafication potential in improving the management of human resources (HRM). For the first two of the previously mentioned scenarios, we present new results related to previously published research, framing these results in a more general discussion over challenges arising when adopting machine learning techniques for social good

    Can Social News Websites Pay for Content and Curation? The SteemIt Cryptocurrency Model

    This is an accepted manuscript of an article published by SAGE Publishing in Journal of Information Science on 15/12/2017, available online: https://doi.org/10.1177/0165551517748290 The accepted version of the publication may differ from the final published version.SteemIt is a Reddit-like social news site that pays members for posting and curating content. It uses micropayments backed by a tradeable currency, exploiting the Bitcoin cryptocurrency generation model to finance content provision in conjunction with advertising. If successful, this paradigm might change the way in which volunteer-based sites operate. This paper investigates 925,092 new members’ first posts for insights into what drives financial success in the site. Initial blog posts on average received 0.01,althoughthemaximumaccruedwas0.01, although the maximum accrued was 20,680.83. Longer, more sentiment-rich or more positive comments with personal information received the greatest financial reward in contrast to more informational or topical content. Thus, there is a clear financial value in starting with a friendly introduction rather than immediately attempting to provide useful content, despite the latter being the ultimate site goal. Follow-up posts also tended to be more successful when more personal, suggesting that interpersonal communication rather than quality content provision has driven the site so far. It remains to be seen whether the model of small typical rewards and the possibility that a post might generate substantially more are enough to incentivise long term participation or a greater focus on informational posts in the long term

    Analisis Sentimen Opini Masyarakat Terhadap Acara Televisi pada Twitter dengan Retweet Analysis dan Naïve Bayes Classifier

    Twitter merupakan media komunikasi yang biasanya digunakan untuk mengutarakan pendapat atau komentar terhadap suatu produk, individu, tokoh ataupun acara televisi dan memberikan informasi. Informasi yang terdapat pada Twitter berupa pertanyaan, komentar atau opini yang bersifat positif maupun negatif. Dengan menggunakan komentar yang didapat dari Twitter dapat melengkapi penilaian acara televisi yang selama ini dilakukan menggunakan rating, di mana hal tersebut tidak dapat sepenuhnya dijadikan acuan dalam suatu penilaian terhadap suatu acara televisi. Analisis sentimen merupakan cabang penelitian dari text mining yang melakukan proses klasifikasi pada dokumen. Metode yang digunakan pada tugas akhir ini adalah Naïve Bayes Classifier dengan menambahkan retweet. Berdasarkan hasil pengujian, NBC dengan menambahkan retweet dapat diimplementasikan dalam menganalisis sentimen mengenai acara televisi dengan rata-rata akurasi yang mencapai 65%. Sedangkan rata-rata akurasi pada NBC tanpa retweet adalah 61%

    Social networks and open innovation: business academic productivity

    Is there any type of relationship between the academic productivity of business researchers and their social networking activity? What does this mean in terms of open innovation? With these objectives, in this paper we have focused on the Technology Acceptance Model and the concept of performativity, filling the gap that exists in the current scientific literature. At the empirical level, we carried out a review of 211 articles from the Web of Science (SSCI), obtaining a total set of 12,939 data points. Our statistical model has showed a clear symbiotic relationship between productivity in Google Scholar and presence in ResearchGate. Furthermore, researchers with a greater presence on LinkedIn or Twitter have low Google Scholar or Web of Science h-indices. We concluded that there is currently a dissociation between academic and professional online networks, something that does not help the applicability of research in business and society, the enduring aim of any search for knowledge. Information Science can play an important role in helping to bridge the gap between academia and the real world. Furthermore, in order to contribute to enhancing the role of universities in open innovation practices, it is essential to design and implement new tools such as online communities that stimulate interaction and facilitate network effects

    Combining Likes-Retweet Analysis and Naive Bayes Classifier within Twitter for Sentiment Analysis

    Sentiment analysis is a research study that aims to extract subjectivity of opinions. Due to massive growth number of user generated content in social media, Twitter is one of the most popular microblogging application which user is freely to discuss and share opinions about specific topic or entity. Twitter have several features that potentially can be used to improve sentiment analysis such as like and retweet. Like and retweet are mechanism in Twitter to propagate or share and to show appreciation of other user posting. This paper proposes a combination of textual and non-textual features to improve performance of sentiment prediction. In this research we apply Naïve Bayes for textual classification and Fisher Score to determine non-textual (like and retweet) features. By combining two kinds of features, our experimental find the optimal value of α and β. The evaluation performance using F1-measure gives 0.838 of accuracy with α and β are 0.6 and 0.4 respectively

    Predicting Information Diffusion on Social Media

    Sotsiaalmeedia on saanud moodsa elu osaks. Pidevalt tekib juurde informatsiooni, mida maailmaga jagatakse. Informatsiooni hajumist on varasemalt uuritud paljude teadlaste poolt, kuna sel on rakendusi erinevates valdkondades, nagu näiteks sotsiaalmeediaturundamine ja uudiste levimise uurimine. Informatsiooni leviku kiirust mõjutab selle olulisus inimestele. Käesolevas töös uuritakse info hajumist sotsiaalvõrgustikus ja ennustatakse sisu populaarsust kasutades juhendatud masinõppe algoritme. Kolme Twitterist pärit andmestikku analüüsitakse ja kasutatakse erinevate masinõppe mudelite konstrueerimiseks.Defineerisime säutsu populaarsuse kui taaspostituste arvu, mida iga originaalsäuts sai, ning püstitasime uurimisprobleemid binaarsete ja mitmeklassiliste ennustusülesannetena. Uurisime, kuidas esialgne säutsude taaspostitamise käitumine mõjutab mudelite ennustusvõimekust. Lisaks analüüsisime, kas viimase tunni taaspostituskäitumine aitab ennustada taas-postituskäitumist järgneva tunni jooksul. Täiendav tähelepanu oli suunatud ka ennustuseks tähtsate tunnuste leidmiseks.Binaarse ennustuse puhul näitasid mudelid tulemusi AUC (area under curve) kuni 95% ning F1-skoori kuni 87%. Mitmeklassiliste ennustuste puhul suutsid mudelid saavutada kuni 60% üldise täpsuse ning F1-skoori kuni 67%. Paremad ennustustäpsused saavutati siis, kui postitustel olid väga madalad või väga kõrged taaspostituste arvud. Me genereerisime mudelid kasutades üht andmestikku ning testisime neid ülejäänud kahe peal. See näitas, et mudelid on piisavalt robustsed, et tegeleda erinevate teemadega.Social media has become a part of the everyday life of modern society. A lot of infor-mation is created and shared with the world continuously. Predicting information has been studied in the past by many researchers since it has its applications in various domains such as viral marketing, news propagation etc.Some information spreads faster compared to others depending on what interests people. In this thesis, by using supervised machine learning algorithms, we studied information diffusion in a social network and predicted content popularity. Three datasets from Twitter are collected and analysed for building and testing various models based on different ma-chine learning algorithms.We defined tweet popularity as number of retweets any original message received and stated our research problems as binary and multiclass prediction tasks. We investigated how initial retweeting behaviour of a message affects the predictive power of a model. We also analysed if a recent one-hour retweeting behaviour can help to predict a tweet popu-larity of the following hour. Besides that, main focus is made on finding features im-portant for the prediction.For binary prediction, the models showed performance of AUC up to 95% and F1 up to 87%. For multiclass prediction, the models were able to predict up to 60% of overall accu-racy and 67% of F1, with more accurate performance of classes with messages with very low and high retweet counts comparing to others. We created our models using one da-taset and tested our approach on the other two datasets, which showed that the models are robust enough to deal with multiple topics