86 research outputs found

    Global disease monitoring and forecasting with Wikipedia

    Full text link
    Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r2r^2 up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

    Google Health Trends performance reflecting dengue incidence for the Brazilian states

    Get PDF
    This work is licensed under a Creative Commons Attribution 4.0 International License.Background Dengue fever is a mosquito-borne infection transmitted by Aedes aegypti and mainly found in tropical and subtropical regions worldwide. Since its re-introduction in 1986, Brazil has become a hotspot for dengue and has experienced yearly epidemics. As a notifiable infectious disease, Brazil uses a passive epidemiological surveillance system to collect and report cases; however, dengue burden is underestimated. Thus, Internet data streams may complement surveillance activities by providing real-time information in the face of reporting lags. Methods We analyzed 19 terms related to dengue using Google Health Trends (GHT), a free-Internet data-source, and compared it with weekly dengue incidence between 2011 to 2016. We correlated GHT data with dengue incidence at the national and state-level for Brazil while using the adjusted R squared statistic as primary outcome measure (0/1). We used survey data on Internet access and variables from the official census of 2010 to identify where GHT could be useful in tracking dengue dynamics. Finally, we used a standardized volatility index on dengue incidence and developed models with different variables with the same objective. Results From the 19 terms explored with GHT, only seven were able to consistently track dengue. From the 27 states, only 12 reported an adjusted R squared higher than 0.8; these states were distributed mainly in the Northeast, Southeast, and South of Brazil. The usefulness of GHT was explained by the logarithm of the number of Internet users in the last 3 months, the total population per state, and the standardized volatility index. Conclusions The potential contribution of GHT in complementing traditional established surveillance strategies should be analyzed in the context of geographical resolutions smaller than countries. For Brazil, GHT implementation should be analyzed in a case-by-case basis. State variables including total population, Internet usage in the last 3 months, and the standardized volatility index could serve as indicators determining when GHT could complement dengue state level surveillance in other countries

    Epidemiological data challenges: planning for a more robust future through data standards

    Get PDF
    Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Comment: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comment

    Forecasting the 2013--2014 Influenza Season using Wikipedia

    Full text link
    Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects between 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013--2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method projected the actual outcome with a high probability. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has past.Comment: Second version. In previous version 2 figure references were compiling wrong due to error in latex sourc

    The Biosurveillance Analytics Resource Directory (BARD): Facilitating the Use of Epidemiological Models for Infectious Disease Surveillance

    Get PDF
    Epidemiological modeling for infectious disease is important for disease management and its routine implementation needs to be facilitated through better description of models in an operational context. A standardized model characterization process that allows selection or making manual comparisons of available models and their results is currently lacking. A key need is a universal framework to facilitate model description and understanding of its features. Los Alamos National Laboratory (LANL) has developed a comprehensive framework that can be used to characterize an infectious disease model in an operational context. The framework was developed through a consensus among a panel of subject matter experts. In this paper, we describe the framework, its application to model characterization, and the development of the Biosurveillance Analytics Resource Directory (BARD; http://brd.bsvgateway.org/brd/), to facilitate the rapid selection of operational models for specific infectious/communicable diseases. We offer this framework and associated database to stakeholders of the infectious disease modeling field as a tool for standardizing model description and facilitating the use of epidemiological models

    Defending Our Public Biological Databases as a Global Critical Infrastructure

    Get PDF
    Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly “retrofitted” mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity

    Results from the centers for disease control and prevention's predict the 2013-2014 Influenza Season Challenge

    Get PDF
    Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013-14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013-March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1 %, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts. © 2016 The Author(s)

    Parametric Uncertainty in Intra-Herd Foot-and-Mouth Disease Epidemiological Models

    No full text
    OBJECTIVE: The objective of this project is to understand how parametric uncertainty within intra-herd Foot-and-Mouth disease epidemiological models affects the outbreak simulations and what implications this has on surveillance and control strategy and policy. INTRODUCTION: The rapid transmission and poor control policy response during recent Foot-and-Mouth disease (FMD) outbreaks have underscored the need for better decision support tools. At the foundation of these decision support tools are the epidemiological models that are parameterized with the data generated from pathogenesis studies of the FMD virus that contain contact transmission data. These values being used to parameterize the model, contrary to assumption, contain a significant amount of uncertainty, which propagates throughout the model affecting output. To understand how parametric uncertainty might affect output, a variety of disease transmission parameters were generated from contact transmission data and parameterized to an intra-herd model. METHODS: Data was initially collected and analyzed for papers that could meet several criteria: they must be contact transmission studies, they must measure viremia (the level of virus in the blood), and they must observe clinical signs. For the studies that met the criteria, tables were constructed and the following information from each paper was collected: serotype, strain, animal species, unique animal identifier, unit of measurement utilized by virus quantification, duration and quantity of viremia, and the time to first report of clinical signs. Three different durations of disease states for the latent, sub-clinically infectious, and clinically infectious periods were generated from the viremia data for each individual animal and grouped in three ways: by strain of virus, by similar experimental design, and all together. Gamma, weibull, and normal distributions were fitted to the data in each group. The distributions for each group were then used to parameterize a stochastic, state transition intra-herd model. Output from the model was analyzed by examining the uncertainty and variance in time to 50% herd infected, time to 2% herd clinically infected, and percentage of herd infected at 2% herd clinically infected for each distribution and group. RESULTS: There is a lack of a standardized definition for disease state durations of the Foot-and-Mouth Disease virus in the literature. As a result, many different models utilize slightly differing values generated from the same data. This project discovered that depending on the definitions used to determine the disease state durations, the model output varied significantly. Additionally, durations of the disease state periods do not follow a normal distribution as may be assumed by many modelers, and are more accurately described by distributions that allow for non-zero skewness. CONCLUSIONS: The data being used to parameterize intra-herd Foot-and-Mouth disease models contains a significant amount of uncertainty that can cause the model output to vary significantly. This uncertainty needs to be clearly communicated to decision makers who use results generated from FMD intra-herd models and illustrates the need for more resources to be put into addressing the issue of basic parameters such as contact rate and disease state duration. Currently no studies have been conducted on the contact rate of animals on farms and the current values used for disease state durations vary drastically depending on the data and methods used. Without a better understanding of the basic parameters, even the most advanced models will not be accurate

    Parametric Uncertainty in Intra-Herd Foot-and-Mouth Disease Epidemiological Models

    Get PDF
    Epidemiological models that simulate the spread of Foot-and-Mouth Disease within a herd are the foundation of decision support tools used by governments to help advise and inform strategy to combat outbreaks. Contact transmission data used to parameterize these models, contrary to assumption, contain a significant amount of variability and uncertainty. The implications of this finding suggest that the resultant model output might not accurately simulate the spread of an outbreak. If this is true, the potential impact due to uncertainty inherent to the decision support tools used by governments might be significant
    • …
    corecore