Search CORE

86 research outputs found

Global disease monitoring and forecasting with Wikipedia

Author: Del Valle Sara Y.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Priedhorsky Reid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/07/2014
Field of study

Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with

r^2

up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

FigShare

Google Health Trends performance reflecting dengue incidence for the Brazilian states

Author: Generous Nicholas
Manore Carrie A.
Martinez Kaitlyn
Osthus Dave
Parikh Nidhi
Romero-Alvarez Daniel
Valle Sara del
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/06/2020
Field of study

This work is licensed under a Creative Commons Attribution 4.0 International License.Background Dengue fever is a mosquito-borne infection transmitted by Aedes aegypti and mainly found in tropical and subtropical regions worldwide. Since its re-introduction in 1986, Brazil has become a hotspot for dengue and has experienced yearly epidemics. As a notifiable infectious disease, Brazil uses a passive epidemiological surveillance system to collect and report cases; however, dengue burden is underestimated. Thus, Internet data streams may complement surveillance activities by providing real-time information in the face of reporting lags. Methods We analyzed 19 terms related to dengue using Google Health Trends (GHT), a free-Internet data-source, and compared it with weekly dengue incidence between 2011 to 2016. We correlated GHT data with dengue incidence at the national and state-level for Brazil while using the adjusted R squared statistic as primary outcome measure (0/1). We used survey data on Internet access and variables from the official census of 2010 to identify where GHT could be useful in tracking dengue dynamics. Finally, we used a standardized volatility index on dengue incidence and developed models with different variables with the same objective. Results From the 19 terms explored with GHT, only seven were able to consistently track dengue. From the 27 states, only 12 reported an adjusted R squared higher than 0.8; these states were distributed mainly in the Northeast, Southeast, and South of Brazil. The usefulness of GHT was explained by the logarithm of the number of Internet users in the last 3 months, the total population per state, and the standardized volatility index. Conclusions The potential contribution of GHT in complementing traditional established surveillance strategies should be analyzed in the context of geographical resolutions smaller than countries. For Brazil, GHT implementation should be analyzed in a case-by-case basis. State variables including total population, Internet usage in the last 3 months, and the standardized volatility index could serve as indicators determining when GHT could complement dengue state level surveillance in other countries

KU ScholarWorks

Epidemiological data challenges: planning for a more robust future through data standards

Author: Daughton Ashlynn R.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Khalsa Hari
Priedhorsky Reid
Tasseff Byron
Velappan Nileena
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Comment: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comment

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Forecasting the 2013--2014 Influenza Season using Wikipedia

Author: Del Valle Sara Y.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Hickmann Kyle S.
Hyman James M.
Priedhorsky Reid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/11/2014
Field of study

Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects between 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013--2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method projected the actual outcome with a high probability. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has past.Comment: Second version. In previous version 2 figure references were compiling wrong due to error in latex sourc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

The Biosurveillance Analytics Resource Directory (BARD): Facilitating the Use of Epidemiological Models for Infectious Disease Surveillance

Author: Abeyta Esteban
Althouse Ben
Burkom Howard
Castro Lauren
Daughton Ashlynn
Del Valle Sara Y
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Hyman James M
Kiang Richard
Margevicius Kristen J
Morse Andrew P
Pancerella Carmen M
Pullum Laura
Ramanathan Arvind
Schlegelmilch Jeffrey
Scott Aaron
Taylor-McCabe Kirsten J
Vespignani Alessandro
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Epidemiological modeling for infectious disease is important for disease management and its routine implementation needs to be facilitated through better description of models in an operational context. A standardized model characterization process that allows selection or making manual comparisons of available models and their results is currently lacking. A key need is a universal framework to facilitate model description and understanding of its features. Los Alamos National Laboratory (LANL) has developed a comprehensive framework that can be used to characterize an infectious disease model in an operational context. The framework was developed through a consensus among a panel of subject matter experts. In this paper, we describe the framework, its application to model characterization, and the development of the Biosurveillance Analytics Resource Directory (BARD; http://brd.bsvgateway.org/brd/), to facilitate the rapid selection of operational models for specific infectious/communicable diseases. We offer this framework and associated database to stakeholders of the infectious disease modeling field as a tool for standardizing model description and facilitating the use of epidemiological models

University of Liverpool Repository

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Defending Our Public Biological Databases as a Global Critical Infrastructure

Author: Christina L. Ting
Christopher Oehmen
Corey M. Hudson
Curtis Johnson
Emilie Purvine
Eric Merkley
Gary Xie
Jacob Caswell
Jason D. Gans
Karen Taylor
Kristin Omberg
Murray Wolinsky
Nicholas Generous
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2019
Field of study

Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly “retrofitted” mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity

Directory of Open Access Journals

Results from the centers for disease control and prevention's predict the 2013-2014 Influenza Season Challenge

Author: Allen Christopher
Alper David
Aman Susan
Anil Kumar V. S.
Aslam Anoshã
Bakach Iurii
Barrett Chris
BASAGNI Stefano
Biggerstaff Matthew
Bisset Keith
Broniatowski David
Brooks Logan
Brownstein John S.
Butler Patrick
Chakraborty Prithwish
Chandra Priyadarshini
Chen Jiangzhuo
Del Valle Sara Y.
Deshpande Alina
Dredze Mark
Eggo Rosalind
Eubank Stephen
Fairchild Geoffrey
Farrow David
Finelli Lyn
Fox Spencer
Fung Isaac Chun Hai
Gambhir Manoj
Generous Nicholas
GESUALDO Francesco
Goldstein Ed
Hao Yi
Henderson Jette
Hickman Kyle S.
Hickmann Kyle S.
Hyman James M.
Hyun Sangwon
Karspeck Alicia
Kaup Hemchandra
Khadivi Pejman
Krishnan Ramesh
Laskowski Kathy
Lewis Bryan
Lipsitch Marc
Lum Kristian
Madhavan Satish
Marathe Madhav
Markar Ashirwad
Mekaru Sumiko R.
Meyers Lauren Ancel
Nagel Anna
Nsoesie Elaine O.
Pashley Bryanne
Paul Michael
PERRA NICOLA
Priedhorsky Reid
Ramakrishnan Anurekha
Ramakrishnan Naren
Rosenfeld Roni
Scarpino Sam
Schaible Braydon J.
Scott James
Sexton Jessica K.
Shaman Jeffrey
Singh Bismark
Srinivasan Ravi
STILO GIOVANNI
Tibshirani Ryan J.
Tozzi Alberto E.
Tse Zion Tsz Ho
Tsou Ming Hsiang
VELARDI Paola
Vespignani Alessandro
Yang Wan
Ying Yuchen
Zhang Qian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013-14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013-March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1 %, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts. © 2016 The Author(s)

Archivio della ricerca- Università di Roma La Sapienza

Parametric Uncertainty in Intra-Herd Foot-and-Mouth Disease Epidemiological Models

Author: Generous Eric Nicholas
Publication venue: 'University of Illinois Libraries'
Publication date: 23/03/2013
Field of study

OBJECTIVE: The objective of this project is to understand how parametric uncertainty within intra-herd Foot-and-Mouth disease epidemiological models affects the outbreak simulations and what implications this has on surveillance and control strategy and policy. INTRODUCTION: The rapid transmission and poor control policy response during recent Foot-and-Mouth disease (FMD) outbreaks have underscored the need for better decision support tools. At the foundation of these decision support tools are the epidemiological models that are parameterized with the data generated from pathogenesis studies of the FMD virus that contain contact transmission data. These values being used to parameterize the model, contrary to assumption, contain a significant amount of uncertainty, which propagates throughout the model affecting output. To understand how parametric uncertainty might affect output, a variety of disease transmission parameters were generated from contact transmission data and parameterized to an intra-herd model. METHODS: Data was initially collected and analyzed for papers that could meet several criteria: they must be contact transmission studies, they must measure viremia (the level of virus in the blood), and they must observe clinical signs. For the studies that met the criteria, tables were constructed and the following information from each paper was collected: serotype, strain, animal species, unique animal identifier, unit of measurement utilized by virus quantification, duration and quantity of viremia, and the time to first report of clinical signs. Three different durations of disease states for the latent, sub-clinically infectious, and clinically infectious periods were generated from the viremia data for each individual animal and grouped in three ways: by strain of virus, by similar experimental design, and all together. Gamma, weibull, and normal distributions were fitted to the data in each group. The distributions for each group were then used to parameterize a stochastic, state transition intra-herd model. Output from the model was analyzed by examining the uncertainty and variance in time to 50% herd infected, time to 2% herd clinically infected, and percentage of herd infected at 2% herd clinically infected for each distribution and group. RESULTS: There is a lack of a standardized definition for disease state durations of the Foot-and-Mouth Disease virus in the literature. As a result, many different models utilize slightly differing values generated from the same data. This project discovered that depending on the definitions used to determine the disease state durations, the model output varied significantly. Additionally, durations of the disease state periods do not follow a normal distribution as may be assumed by many modelers, and are more accurately described by distributions that allow for non-zero skewness. CONCLUSIONS: The data being used to parameterize intra-herd Foot-and-Mouth disease models contains a significant amount of uncertainty that can cause the model output to vary significantly. This uncertainty needs to be clearly communicated to decision makers who use results generated from FMD intra-herd models and illustrates the need for more resources to be put into addressing the issue of basic parameters such as contact rate and disease state duration. Currently no studies have been conducted on the contact rate of animals on farms and the current values used for disease state durations vary drastically depending on the data and methods used. Without a better understanding of the basic parameters, even the most advanced models will not be accurate

University of Illinois at Chicago: Journals@UIC

Crossref

PubMed Central

Parametric Uncertainty in Intra-Herd Foot-and-Mouth Disease Epidemiological Models

Author: Generous Eric Nicholas
Publication venue: University of Illinois at Chicago Library
Publication date: 23/03/2013
Field of study

Epidemiological models that simulate the spread of Foot-and-Mouth Disease within a herd are the foundation of decision support tools used by governments to help advise and inform strategy to combat outbreaks. Contact transmission data used to parameterize these models, contrary to assumption, contain a significant amount of variability and uncertainty. The implications of this finding suggest that the resultant model output might not accurately simulate the spread of an outbreak. If this is true, the potential impact due to uncertainty inherent to the decision support tools used by governments might be significant

University of Illinois at Chicago: Journals@UIC

PubMed Central