6,754 research outputs found

    A meta-analysis of state-of-the-art electoral prediction from Twitter data

    Full text link
    Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this paper aims to (1) provide a balanced and critical review of the state of the art; (2) cast light on the presume predictive power of Twitter data; and (3) depict a roadmap to push forward the field. Hence, a scheme to characterize Twitter prediction methods is proposed. It covers every aspect from data collection to performance evaluation, through data processing and vote inference. Using that scheme, prior research is analyzed and organized to explain the main approaches taken up to date but also their weaknesses. This is the first meta-analysis of the whole body of research regarding electoral prediction from Twitter data. It reveals that its presumed predictive power regarding electoral prediction has been rather exaggerated: although social media may provide a glimpse on electoral outcomes current research does not provide strong evidence to support it can replace traditional polls. Finally, future lines of research along with a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    Get PDF
    In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

    The ISIS Twitter census: defining and describing the population of ISIS supporters on Twitter

    Get PDF
    Presents a demographic snapshot of ISIS supporters on Twitter by analysing a sample of 20,000 ISIS-supporting Twitter accounts, mapping the locations, preferred languages, and the number and type of followers of these accounts. Overview Although much ink has been spilled on ISIS’s activity on Twitter, very basic questions about the group’s social media strategy remain unanswered. In a new analysis paper, J.M. Berger and Jonathon Morgan answer fundamental questions about how many Twitter users support ISIS, who and where they are, and how they participate in its highly organized online activities. Previous analyses of ISIS’s Twitter reach have relied on limited segments of the overall ISIS social network. The small, cellular nature of that network—and the focus on particular subsets within the network such as foreign fighters—may create misleading conclusions. This information vacuum extends to discussions of how the West should respond to the group’s online campaigns. Berger and Morgan present a demographic snapshot of ISIS supporters on Twitter by analyzing a sample of 20,000 ISIS-supporting Twitter accounts. Using a sophisticated and innovative methodology, the authors map the locations, preferred languages, and the number and type of followers of these accounts. Among the key findings: From September through December 2014, the authors estimate that at least 46,000 Twitter accounts were used by ISIS supporters, although not all of them were active at the same time.  Typical ISIS supporters were located within the organization’s territories in Syria and Iraq, as well as in regions contested by ISIS. Hundreds of ISIS-supporting accounts sent tweets with location metadata embedded.  Almost one in five ISIS supporters selected English as their primary language when using Twitter. Three quarters selected Arabic. ISIS-supporting accounts had an average of about 1,000 followers each, considerably higher than an ordinary Twitter user. ISIS-supporting accounts were also considerably more active than non-supporting users. A minimum of 1,000 ISIS-supporting accounts were suspended by Twitter between September and December 2014. Accounts that tweeted most often and had the most followers were most likely to be suspended. Much of ISIS’s social media success can be attributed to a relatively small group of hyperactive users, numbering between 500 and 2,000 accounts, which tweet in concentrated bursts of high volume. Based on their key findings, the authors recommend social media companies and the U.S government work together to devise appropriate responses to extremism on social media. Approaches to the problem of extremist use of social media, Berger and Morgan contend, are most likely to succeed when they are mainstreamed into wider dialogues among the broad range of community, private, and public stakeholders

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election

    Get PDF
    Social media has become an emerging alternative to opinion polls for public opinion collection, while it is still posing many challenges as a passive data source, such as structurelessness, quantifiability, and representativeness. Social media data with geotags provide new opportunities to unveil the geographic locations of users expressing their opinions. This paper aims to answer two questions: 1) whether quantifiable measurement of public opinion can be obtained from social media and 2) whether it can produce better or complementary measures compared to opinion polls. This research proposes a novel approach to measure the relative opinion of Twitter users towards public issues in order to accommodate more complex opinion structures and take advantage of the geography pertaining to the public issues. To ensure that this new measure is technically feasible, a modeling framework is developed including building a training dataset by adopting a state-of-the-art approach and devising a new deep learning method called Opinion-Oriented Word Embedding. With a case study of the tweets selected for the 2016 U.S. presidential election, we demonstrate the predictive superiority of our relative opinion approach and we show how it can aid visual analytics and support opinion predictions. Although the relative opinion measure is proved to be more robust compared to polling, our study also suggests that the former can advantageously complement the later in opinion prediction
    • …
    corecore