Search CORE

10,297 research outputs found

Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump

Author: Bovet Alexandre
Makse Hernan A.
Morone Flaviano
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2017
Field of study

Measuring and forecasting opinion trends from real-time social media is a long-standing goal of big-data analytics. Despite its importance, there has been no conclusive scientific evidence so far that social media activity can capture the opinion of the general population. Here we develop a method to infer the opinion of Twitter users regarding the candidates of the 2016 US Presidential Election by using a combination of statistical physics of complex networks and machine learning based on hashtags co-occurrence to develop an in-domain training set approaching 1 million tweets. We investigate the social networks formed by the interactions among millions of Twitter users and infer the support of each user to the presidential candidates. The resulting Twitter trends follow the New York Times National Polling Average, which represents an aggregate of hundreds of independent traditional polls, with remarkable accuracy. Moreover, the Twitter opinion trend precedes the aggregated NYT polls by 10 days, showing that Twitter can be an early signal of global opinion trends. Our analytics unleash the power of Twitter to uncover social trends from elections, brands to political movements, and at a fraction of the cost of national polls

arXiv.org e-Print Archive

Oxford University Research Archive

Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

Author: Allen David
Compton Ryan
Jurgens David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors. Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan, David Jurgens, and David Allen. "Geotagging one hundred million twitter accounts with total variation minimization." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 201

arXiv.org e-Print Archive

CiteSeerX

Approximating the Spectrum of a Graph

Author: Cohen-Steiner David
Kong Weihao
Sohler Christian
Valiant Gregory
Publication venue
Publication date: 05/12/2017
Field of study

The spectrum of a network or graph

G=(V,E)

with adjacency matrix

A

, consists of the eigenvalues of the normalized Laplacian

L= I - D^{-1/2} A D^{-1/2}

. This set of eigenvalues encapsulates many aspects of the structure of the graph, including the extent to which the graph posses community structures at multiple scales. We study the problem of approximating the spectrum

\lambda = (\lambda_1,\dots,\lambda_{|V|})

0 \le \lambda_1,\le \dots, \le \lambda_{|V|}\le 2

G

in the regime where the graph is too large to explicitly calculate the spectrum. We present a sublinear time algorithm that, given the ability to query a random node in the graph and select a random neighbor of a given node, computes a succinct representation of an approximation

\widetilde \lambda = (\widetilde \lambda_1,\dots,\widetilde \lambda_{|V|})

0 \le \widetilde \lambda_1,\le \dots, \le \widetilde \lambda_{|V|}\le 2

such that

\|\widetilde \lambda - \lambda\|_1 \le \epsilon |V|

. Our algorithm has query complexity and running time

exp(O(1/\epsilon))

, independent of the size of the graph,

|V|

. We demonstrate the practical viability of our algorithm on 15 different real-world graphs from the Stanford Large Network Dataset Collection, including social networks, academic collaboration graphs, and road networks. For the smallest of these graphs, we are able to validate the accuracy of our algorithm by explicitly calculating the true spectrum; for the larger graphs, such a calculation is computationally prohibitive. In addition we study the implications of our algorithm to property testing in the bounded degree graph model

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server