7,538 research outputs found
Geo-located Twitter as the proxy for global mobility patterns
In the advent of a pervasive presence of location sharing services
researchers gained an unprecedented access to the direct records of human
activity in space and time. This paper analyses geo-located Twitter messages in
order to uncover global patterns of human mobility. Based on a dataset of
almost a billion tweets recorded in 2012 we estimate volumes of international
travelers in respect to their country of residence. We examine mobility
profiles of different nations looking at the characteristics such as mobility
rate, radius of gyration, diversity of destinations and a balance of the
inflows and outflows. The temporal patterns disclose the universal seasons of
increased international mobility and the peculiar national nature of overseen
travels. Our analysis of the community structure of the Twitter mobility
network, obtained with the iterative network partitioning, reveals spatially
cohesive regions that follow the regional division of the world. Finally, we
validate our result with the global tourism statistics and mobility models
provided by other authors, and argue that Twitter is a viable source to
understand and quantify global mobility patterns.Comment: 17 pages, 13 figure
Recommended from our members
Forecasting audience increase on YouTube
User profiles constructed on Social Web platforms are often motivated by the need to maximise user reputation within a community. Subscriber, or follower, counts are an indicator of the influence and standing that the user has, where greater values indicate a greater perception or regard for what the user has to say or share. However, at present there lacks an understanding of the factors that lead to an increase in such audience levels, and how a user’s behaviour can a!ect their reputation. In this paper we attempt to fill this gap, by examining data collected from YouTube over regular time intervals. We explore the correlation between the subscriber counts and several behaviour features - extracted from both the user’s profile and the content they have shared. Through the use of a Multiple Linear Regression model we are able to forecast the audience levels that users will yield based on observed behaviour. Combining such a model with an exhaustive feature selection process, we yield statistically significant performance over a baseline model containing all features
Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users.
Human behaviour is highly individual by nature, yet statistical structures are emerging which seem to govern the actions of human beings collectively. Here we search for universal statistical laws dictating the timing of human actions in communication decisions. We focus on the distribution of the time interval between messages in human broadcast communication, as documented in Twitter, and study a collection of over 160,000 tweets for three user categories: personal (controlled by one person), managed (typically PR agency controlled) and bot-controlled (automated system). To test our hypothesis, we investigate whether it is possible to differentiate between user types based on tweet timing behaviour, independently of the content in messages. For this purpose, we developed a system to process a large amount of tweets for reality mining and implemented two simple probabilistic inference algorithms: 1. a naive Bayes classifier, which distinguishes between two and three account categories with classification performance of 84.6% and 75.8%, respectively and 2. a prediction algorithm to estimate the time of a users next tweet with an R2 ≈0.7. Our results show that we can reliably distinguish between the three user categories as well as predict the distribution of a users inter-message time with reasonable accuracy. More importantly, we identify a characteristic power-law decrease in the tail of inter-message time distribution by human users which is different from that obtained for managed and automated accounts. This result is evidence of a universal law that permeates the timing of human decisions in broadcast communication and extends the findings of several previous studies of peer-to-peer communication. © 2013 Tavares, Faisal
Discovering Periodic Patterns in Historical News
We address the problem of observing periodic changes in the behaviour of a large population, by analysing the daily contents of newspapers published in the United States and United Kingdom from 1836 to 1922. This is done by analysing the daily time series of the relative frequency of the 25K most frequent words for each country, resulting in the study of 50K time series for 31,755 days. Behaviours that are found to be strongly periodic include seasonal activities, such as hunting and harvesting. A strong connection with natural cycles is found, with a pronounced presence of fruits, vegetables, flowers and game. Periodicities dictated by religious or civil calendars are also detected and show a different wave-form than those provoked by weather. States that can be revealed include the presence of infectious disease, with clear annual peaks for fever, pneumonia and diarrhoea. Overall, 2% of the words are found to be strongly periodic, and the period most frequently found is 365 days. Comparisons between UK and US, and between modern and historical news, reveal how the fundamental cycles of life are shaped by the seasons, but also how this effect has been reduced in modern times
Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data
Use of socially generated "big data" to access information about collective
states of the minds in human societies has become a new paradigm in the
emerging field of computational social science. A natural application of this
would be the prediction of the society's reaction to a new product in the sense
of popularity and adoption rate. However, bridging the gap between "real time
monitoring" and "early predicting" remains a big challenge. Here we report on
an endeavor to build a minimalistic predictive model for the financial success
of movies based on collective activity data of online users. We show that the
popularity of a movie can be predicted much before its release by measuring and
analyzing the activity level of editors and viewers of the corresponding entry
to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the
dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
- …