2,612 research outputs found

    Topicality and Social Impact: Diverse Messages but Focused Messengers

    Full text link
    Are users who comment on a variety of matters more likely to achieve high influence than those who delve into one focused field? Do general Twitter hashtags, such as #lol, tend to be more popular than novel ones, such as #instantlyinlove? Questions like these demand a way to detect topics hidden behind messages associated with an individual or a hashtag, and a gauge of similarity among these topics. Here we develop such an approach to identify clusters of similar hashtags by detecting communities in the hashtag co-occurrence network. Then the topical diversity of a user's interests is quantified by the entropy of her hashtags across different topic clusters. A similar measure is applied to hashtags, based on co-occurring tags. We find that high topical diversity of early adopters or co-occurring tags implies high future popularity of hashtags. In contrast, low diversity helps an individual accumulate social influence. In short, diverse messages and focused messengers are more likely to gain impact.Comment: 9 pages, 7 figures, 6 table

    The State of the Art in Cartograms

    Full text link
    Cartograms combine statistical and geographical information in thematic maps, where areas of geographical regions (e.g., countries, states) are scaled in proportion to some statistic (e.g., population, income). Cartograms make it possible to gain insight into patterns and trends in the world around us and have been very popular visualizations for geo-referenced data for over a century. This work surveys cartogram research in visualization, cartography and geometry, covering a broad spectrum of different cartogram types: from the traditional rectangular and table cartograms, to Dorling and diffusion cartograms. A particular focus is the study of the major cartogram dimensions: statistical accuracy, geographical accuracy, and topological accuracy. We review the history of cartograms, describe the algorithms for generating them, and consider task taxonomies. We also review quantitative and qualitative evaluations, and we use these to arrive at design guidelines and research challenges

    Who Voted in 2016? Using Fuzzy Forests to Understand Voter Turnout

    Get PDF
    Objective: What can machine learning tell us about who voted in 2016? There are numerous competing voter turnout theories, and a large number of covariates are required to assess which theory best explains turnout. This article is a proof of concept that machine learning can help overcome this curse of dimensionality and reveal important insights in studies of political phenomena. Methods: We use fuzzy forests, an extension of random forests, to screen variables for a parsimonious but accurate prediction. Fuzzy forests achieve accurate variable importance measures in the face of high‐dimensional and highly correlated data. The data that we use are from the 2016 Cooperative Congressional Election Study. Results: Fuzzy forests chose only a small number of covariates as major correlates of 2016 turnout and still boasted high predictive performance. Conclusion: Our analysis provides three important conclusions about turnout in 2016: registration and voting procedures were important, political issues were important (especially Obamacare, climate change, and fiscal policy), but few demographic variables other than age were strongly associated with turnout. We conclude that fuzzy forests is an important methodology for studying overdetermined questions in social sciences

    Who Voted in 2016? Using Fuzzy Forests to Understand Voter Turnout

    Get PDF
    Objective: What can machine learning tell us about who voted in 2016? There are numerous competing voter turnout theories, and a large number of covariates are required to assess which theory best explains turnout. This article is a proof of concept that machine learning can help overcome this curse of dimensionality and reveal important insights in studies of political phenomena. Methods: We use fuzzy forests, an extension of random forests, to screen variables for a parsimonious but accurate prediction. Fuzzy forests achieve accurate variable importance measures in the face of high‐dimensional and highly correlated data. The data that we use are from the 2016 Cooperative Congressional Election Study. Results: Fuzzy forests chose only a small number of covariates as major correlates of 2016 turnout and still boasted high predictive performance. Conclusion: Our analysis provides three important conclusions about turnout in 2016: registration and voting procedures were important, political issues were important (especially Obamacare, climate change, and fiscal policy), but few demographic variables other than age were strongly associated with turnout. We conclude that fuzzy forests is an important methodology for studying overdetermined questions in social sciences

    Network design meets in silico evolutionary biology

    Get PDF
    Cell fate is programmed through gene regulatory networks that perform several calculations to take the appropriate decision. In silico evolutionary optimization mimics the way Nature has designed such gene regulatory networks. In this review we discuss the basic principles of these evolutionary approaches and how they can be applied to engineer synthetic networks. We summarize the basic guidelines to implement an in silico evolutionary design method, the operators for mutation and selection that iteratively drive the network architecture towards a specified dynamical behavior. Interestingly, as it happens in natural evolution, we show the existence of patterns of punctuated evolution. In addition, we highlight several examples of models that have been designed using automated procedures, together with different objective functions to select for the proper behavior. Finally, we briefly discuss the modular designability of gene regulatory networks and its potential application in biotechnology.Supported by fellowships from Generalitat Valenciana and the European Molecular Biology Organization to G. R. and by grants from the Spanish Ministerio de Ciencia e Innovación to J.C. and S.F.E.Peer reviewe

    Why polls fail to predict elections

    Get PDF
    In the past decade we have witnessed the failure of traditional polls in predicting presidential election outcomes across the world. To understand the reasons behind these failures we analyze the raw data of a trusted pollster which failed to predict, along with the rest of the pollsters, the surprising 2019 presidential election in Argentina which has led to a major market collapse in that country. Analysis of the raw and re-weighted data from longitudinal surveys performed before and after the elections reveals clear biases (beyond well-known low-response rates) related to mis-representation of the population and, most importantly, to social-desirability biases, i.e., the tendency of respondents to hide their intention to vote for controversial candidates. We then propose a longitudinal opinion tracking method based on big-data analytics from social media, machine learning, and network theory that overcomes the limits of traditional polls. The model achieves accurate results in the 2019 Argentina elections predicting the overwhelming victory of the candidate Alberto Fern\'andez over the president Mauricio Macri; a result that none of the traditional pollsters in the country was able to predict. Beyond predicting political elections, the framework we propose is more general and can be used to discover trends in society; for instance, what people think about economics, education or climate change.Comment: 47 pages, 10 tables, 15 figure

    Bayesian stochastic blockmodeling

    Full text link
    This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool at https://graph-tool.skewed.de . See also the HOWTO at https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
    corecore