Search CORE

5,295 research outputs found

Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling

Author: Cedrim Diego
Daniilakis Michael
Garcia Alessandro
Missier Paolo
Miu Tudor
Pal Atinder
Romanovsky Alexander
Sousa Leonardo da Silva
Publication venue
Publication date: 01/01/2016
Field of study

Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano, Switzerlan

arXiv.org e-Print Archive

University of Birmingham Research Portal

Recommended from our members

Tracing the German Centennial Flood in the Stream of Tweets: First Lessons Learned

Author: Andrienko G.
Andrienko N.
Bothe S.
Fuchs G.
Stange H.
Publication venue
Publication date: 01/01/2013
Field of study

Social microblogging services such as Twitter result in massive streams of georeferenced messages and geolocated status updates. This real-time source of information is invaluable for many application areas, in particular for disaster detection and response scenarios. Consequently, a considerable number of works has dealt with issues of their acquisition, analysis and visualization. Most of these works not only assume an appropriate percentage of georeferenced messages that allows for detecting relevant events for a specific region and time frame, but also that these geolocations are reasonably correct in representing places and times of the underlying spatio-temporal situation. In this paper, we review these two key assumption based on the results of applying a visual analytics approach to a dataset of georeferenced Tweets from Germany over eight months witnessing several large-scale flooding situations throughout the country. Our results con rm the potential of Twitter as a distributed 'social sensor' but at the same time highlight some caveats in interpreting immediate results. To overcome these limits we explore incorporating evidence from other data sources including further social media and mobile phone network metrics to detect, confirm and refine events with respect to location and time. We summarize the lessons learned from our initial analysis by proposing recommendations and outline possible future work directions

City Research Online

Fraunhofer-ePrints

From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

Author: A Fog
A Vidmer
B Karrer
C Aicher
D Liben-Nowell
G Robins
I Scholtes
J Jacod
JD Wilson
K Anand
M Domenico De
M Kivelä
M Molloy
M Rosvall
M Szell
MEJ Newman
MEJ Newman
N Eagle
N Eagle
P Erdös
P Holme
TP Peixoto
WW Zachary
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2017
Field of study

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.Comment: 10 pages, 8 figures, accepted at SocInfo201

arXiv.org e-Print Archive

Crossref