121 research outputs found
Infodemiology for Syndromic Surveillance of Dengue and Typhoid Fever in the Philippines
Finding determinants of disease outbreaks before its occurrence is necessary in reducing its impact in populations. The supposed advantage of obtaining information brought by automated systems fall short because of the inability to access real-time data as well as interoperate fragmented systems, leading to longer transfer and processing of data. As such, this study presents the use of realtime latent data from social media, particularly from Twitter, to complement existing disease surveillance efforts. By being able to classify infodemiological (health-related) tweets, this study is able to produce a range of possible disease incidences of Dengue and Typhoid Fever within the Western Visayas region in the Philippines. Both diseases showed a strong positive correlation (R \u3e .70) between the number of tweets and surveillance data based on official records of the Philippine Health Agency. Regression equations were derived to determine a numerical range of possible disease incidences given certain number of tweets. As an example, the study shows that 10 infodemiological tweets represent the presence of 19-25 Dengue Fever incidences at the provincial level
Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling
Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue
and Zika in Brasil and other tropical regions has long been a priority for
governments in affected areas. Streaming social media content, such as Twitter,
is increasingly being used for health vigilance applications such as flu
detection. However, previous work has not addressed the complexity of drastic
seasonal changes on Twitter content across multiple epidemic outbreaks. In
order to address this gap, this paper contrasts two complementary approaches to
detecting Twitter content that is relevant for Dengue outbreak detection,
namely supervised classification and unsupervised clustering using topic
modelling. Each approach has benefits and shortcomings. Our classifier achieves
a prediction accuracy of about 80\% based on a small training set of about
1,000 instances, but the need for manual annotation makes it hard to track
seasonal changes in the nature of the epidemics, such as the emergence of new
types of virus in certain geographical locations. In contrast, LDA-based topic
modelling scales well, generating cohesive and well-separated clusters from
larger samples. While clusters can be easily re-generated following changes in
epidemics, however, this approach makes it hard to clearly segregate relevant
tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano,
Switzerlan
Spatio-temporal analysis of brand interest using social networks
Social Networks have become part of many people's lives, and applications like Facebook and Twitter are used on a daily basis by millions of users. In such applications, users share their feelings, opinions, and experiences. Twitter in particular, is used to talk about diverse topics, including brands and their products and services. In this paper, we analyze how brand interest is reflected on Twitter and how this platform can be used to monitor mentions of specific brands, as an indicator of brand interest. Our methodology is based on time, location, and the number of brand-related tweets to perform a spatio-temporal analysis. This type of analysis can provide relevant insights into how brand interest evolves over the time and how it might differ from one country to another. We have collected four years' worth of data and report trends, differences, and similarities in terms of brand interest for each brand and for each country.info:eu-repo/semantics/acceptedVersio
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
Enhancement of Epidemiological Models for Dengue Fever Based on Twitter Data
Epidemiological early warning systems for dengue fever rely on up-to-date
epidemiological data to forecast future incidence. However, epidemiological
data typically requires time to be available, due to the application of
time-consuming laboratorial tests. This implies that epidemiological models
need to issue predictions with larger antecedence, making their task even more
difficult. On the other hand, online platforms, such as Twitter or Google,
allow us to obtain samples of users' interaction in near real-time and can be
used as sensors to monitor current incidence. In this work, we propose a
framework to exploit online data sources to mitigate the lack of up-to-date
epidemiological data by obtaining estimates of current incidence, which are
then explored by traditional epidemiological models. We show that the proposed
framework obtains more accurate predictions than alternative approaches, with
statistically better results for delays greater or equal to 4 weeks.Comment: ACM Digital Health 201
360 Quantified Self
Wearable devices with a wide range of sensors have contributed to the rise of
the Quantified Self movement, where individuals log everything ranging from the
number of steps they have taken, to their heart rate, to their sleeping
patterns. Sensors do not, however, typically sense the social and ambient
environment of the users, such as general life style attributes or information
about their social network. This means that the users themselves, and the
medical practitioners, privy to the wearable sensor data, only have a narrow
view of the individual, limited mainly to certain aspects of their physical
condition.
In this paper we describe a number of use cases for how social media can be
used to complement the check-up data and those from sensors to gain a more
holistic view on individuals' health, a perspective we call the 360 Quantified
Self. Health-related information can be obtained from sources as diverse as
food photo sharing, location check-ins, or profile pictures. Additionally,
information from a person's ego network can shed light on the social dimension
of wellbeing which is widely acknowledged to be of utmost importance, even
though they are currently rarely used for medical diagnosis. We articulate a
long-term vision describing the desirable list of technical advances and
variety of data to achieve an integrated system encompassing Electronic Health
Records (EHR), data from wearable devices, alongside information derived from
social media data.Comment: QCRI Technical Repor
Social media mining for identification and exploration of health-related information from pregnant women
Widespread use of social media has led to the generation of substantial
amounts of information about individuals, including health-related information.
Social media provides the opportunity to study health-related information about
selected population groups who may be of interest for a particular study. In
this paper, we explore the possibility of utilizing social media to perform
targeted data collection and analysis from a particular population group --
pregnant women. We hypothesize that we can use social media to identify cohorts
of pregnant women and follow them over time to analyze crucial health-related
information. To identify potentially pregnant women, we employ simple
rule-based searches that attempt to detect pregnancy announcements with
moderate precision. To further filter out false positives and noise, we employ
a supervised classifier using a small number of hand-annotated data. We then
collect their posts over time to create longitudinal health timelines and
attempt to divide the timelines into different pregnancy trimesters. Finally,
we assess the usefulness of the timelines by performing a preliminary analysis
to estimate drug intake patterns of our cohort at different trimesters. Our
rule-based cohort identification technique collected 53,820 users over thirty
months from Twitter. Our pregnancy announcement classification technique
achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user
timelines. Analysis of the timelines revealed that pertinent health-related
information, such as drug-intake and adverse reactions can be mined from the
data. Our approach to using user timelines in this fashion has produced very
encouraging results and can be employed for other important tasks where
cohorts, for which health-related information may not be available from other
sources, are required to be followed over time to derive population-based
estimates.Comment: 9 page
- …