2,007 research outputs found
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
We present a machine learning-based methodology capable of providing
real-time ("nowcast") and forecast estimates of influenza activity in the US by
leveraging data from multiple data sources including: Google searches, Twitter
microblogs, nearly real-time hospital visit records, and data from a
participatory surveillance system. Our main contribution consists of combining
multiple influenza-like illnesses (ILI) activity estimates, generated
independently with each data source, into a single prediction of ILI utilizing
machine learning ensemble approaches. Our methodology exploits the information
in each data source and produces accurate weekly ILI predictions for up to four
weeks ahead of the release of CDC's ILI reports. We evaluate the predictive
ability of our ensemble approach during the 2013-2014 (retrospective) and
2014-2015 (live) flu seasons for each of the four weekly time horizons. Our
ensemble approach demonstrates several advantages: (1) our ensemble method's
predictions outperform every prediction using each data source independently,
(2) our methodology can produce predictions one week ahead of GFT's real-time
estimates with comparable accuracy, and (3) our two and three week forecast
estimates have comparable accuracy to real-time predictions using an
autoregressive model. Moreover, our results show that considerable insight is
gained from incorporating disparate data streams, in the form of social media
and crowd sourced data, into influenza predictions in all time horizon
Recommended from our members
Emerging Challenges and Opportunities in Infectious Disease Epidemiology.
Much of the intellectual tradition of modern epidemiology stems from efforts to understand and combat chronic diseases persisting through the 20th century epidemiologic transition of countries such as the United States and United Kingdom. After decades of relative obscurity, infectious disease epidemiology has undergone an intellectual rebirth in recent years amid increasing recognition of the threat posed by both new and familiar pathogens. Here, we review the emerging coalescence of infectious disease epidemiology around a core set of study designs and statistical methods bearing little resemblance to the chronic disease epidemiology toolkit. We offer our outlook on challenges and opportunities facing the field, including the integration of novel molecular and digital information sources into disease surveillance, the assimilation of such data into models of pathogen spread, and the increasing contribution of models to public health practice. We next consider emerging paradigms in causal inference for infectious diseases, ranging from approaches to evaluating vaccines and antimicrobial therapies to the task of ascribing clinical syndromes to etiologic microorganisms, an age-old problem transformed by our increasing ability to characterize human-associated microbiota. These areas represent an increasingly important component of epidemiology training programs for future generations of researchers and practitioners
Recommended from our members
Forecasting seasonal influenza fusing digital indicators and a mechanistic disease model
The availability of novel digital data streams that can be used as proxy for monitoring infectious disease incidence is ushering in a new era for real-time forecast approaches to disease spreading. Here, we propose the first seasonal influenza forecast framework based on a stochastic, spatially structured mechanistic model (individual level microsimulation) initialized with geo-localized microblogging data. The framework provides for more than 600 census areas in the United States, Italy and Spain, the initial conditions for a stochastic epidemic computational model that generates an ensemble of forecasts for the main indicators of the epidemic season: peak time and intensity. We evaluate the forecasts accuracy and reliability by comparing the results from our framework with the data from the official influenza surveillance systems in the US, Italy and Spain in the seasons 2014/15 and 2015/16. In all countries studied, the proposed framework provides reliable results with leads of up to 6 weeks that became more stable and accurate with progression of the season. The results for the United States have been generated in real-time in the context of the Centers for Disease Control and Prevention “Forecasting the Influenza Season Challenge". A characteristic feature of the mechanistic modeling approach is in the explicit estimate of key epidemiological parameters relevant for public health decision-making that cannot be achieved with statistical models not considering the disease dynamic. Furthermore, the presented framework allows the fusion of multiple data streams in the initialization stage and can be enriched with census, weather and socioeconomic data
- …