523 research outputs found

    Accounting for seasonal patterns in syndromic surveillance data for outbreak detection

    Get PDF
    BACKGROUND: Syndromic surveillance (SS) can potentially contribute to outbreak detection capability by providing timely, novel data sources. One SS challenge is that some syndrome counts vary with season in a manner that is not identical from year to year. Our goal is to evaluate the impact of inconsistent seasonal effects on performance assessments (false and true positive rates) in the context of detecting anomalous counts in data that exhibit seasonal variation. METHODS: To evaluate the impact of inconsistent seasonal effects, we injected synthetic outbreaks into real data and into data simulated from each of two models fit to the same real data. Using real respiratory syndrome counts collected in an emergency department from 2/1/94–5/31/03, we varied the length of training data from one to eight years, applied a sequential test to the forecast errors arising from each of eight forecasting methods, and evaluated their detection probabilities (DP) on the basis of 1000 injected synthetic outbreaks. We did the same for each of two corresponding simulated data sets. The less realistic, nonhierarchical model's simulated data set assumed that "one season fits all," meaning that each year's seasonal peak has the same onset, duration, and magnitude. The more realistic simulated data set used a hierarchical model to capture violation of the "one season fits all" assumption. RESULTS: This experiment demonstrated optimistic bias in DP estimates for some of the methods when data simulated from the nonhierarchical model was used for DP estimation, thus suggesting that at least for some real data sets and methods, it is not adequate to assume that "one season fits all." CONCLUSION: For the data we analyze, the "one season fits all " assumption is violated, and DP performance claims based on simulated data that assume "one season fits all," for the forecast methods considered, except for moving average methods, tend to be optimistic. Moving average methods based on relatively short amounts of training data are competitive on all three data sets, but are particularly competitive on the real data and on data from the hierarchical model, which are the two data sets that violate the "one season fits all" assumption

    Syndromic surveillance of influenza-like illness in Scotland during the influenza A H1N1v pandemic and beyond

    Get PDF
    Syndromic surveillance refers to the rapid monitoring of syndromic data to highlight and follow outbreaks of infectious diseases, increasing situational awareness. Such systems are based upon statistical models to described routinely collected health data. We describe a working exception reporting system (ERS) currently used in Scotland to monitor calls received to the NHS telephone helpline, NHS24. We demonstrate the utility of the system to describe the time series data from NHS24 both at an aggregated Scotland level and at the individual health board level for two case studies, firstly during the initial phase of the 2009 Influenza A H1N1v and secondly for the emergence of seasonal influenza in each winter season from 2006/07 and 2010/11. In particular, we focus on a localised cluster of infection in the Highland health board and the ability of the system to highlight this outbreak. Caveats of the system, including the effect of media reporting of the pandemic on the results and the associated statistical issues, will be discussed. We discuss the adaptability and timeliness of the system and how this continues to form part of a suite of surveillance used to give early warnings to public health decision makers

    Syndromic surveillance: STL for modeling, visualizing, and monitoring disease counts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Public health surveillance is the monitoring of data to detect and quantify unusual health events. Monitoring pre-diagnostic data, such as emergency department (ED) patient chief complaints, enables rapid detection of disease outbreaks. There are many sources of variation in such data; statistical methods need to accurately model them as a basis for timely and accurate disease outbreak methods.</p> <p>Methods</p> <p>Our new methods for modeling daily chief complaint counts are based on a seasonal-trend decomposition procedure based on loess (STL) and were developed using data from the 76 EDs of the Indiana surveillance program from 2004 to 2008. Square root counts are decomposed into inter-annual, yearly-seasonal, day-of-the-week, and random-error components. Using this decomposition method, we develop a new synoptic-scale (days to weeks) outbreak detection method and carry out a simulation study to compare detection performance to four well-known methods for nine outbreak scenarios.</p> <p>Result</p> <p>The components of the STL decomposition reveal insights into the variability of the Indiana ED data. Day-of-the-week components tend to peak Sunday or Monday, fall steadily to a minimum Thursday or Friday, and then rise to the peak. Yearly-seasonal components show seasonal influenza, some with bimodal peaks.</p> <p>Some inter-annual components increase slightly due to increasing patient populations. A new outbreak detection method based on the decomposition modeling performs well with 90 days or more of data. Control limits were set empirically so that all methods had a specificity of 97%. STL had the largest sensitivity in all nine outbreak scenarios. The STL method also exhibited a well-behaved false positive rate when run on the data with no outbreaks injected.</p> <p>Conclusion</p> <p>The STL decomposition method for chief complaint counts leads to a rapid and accurate detection method for disease outbreaks, and requires only 90 days of historical data to be put into operation. The visualization tools that accompany the decomposition and outbreak methods provide much insight into patterns in the data, which is useful for surveillance operations.</p

    A Hidden Markov Model for Analysis of Frontline Veterinary Data for Emerging Zoonotic Disease Surveillance

    Get PDF
    Surveillance systems tracking health patterns in animals have potential for early warning of infectious disease in humans, yet there are many challenges that remain before this can be realized. Specifically, there remains the challenge of detecting early warning signals for diseases that are not known or are not part of routine surveillance for named diseases. This paper reports on the development of a hidden Markov model for analysis of frontline veterinary sentinel surveillance data from Sri Lanka. Field veterinarians collected data on syndromes and diagnoses using mobile phones. A model for submission patterns accounts for both sentinel-related and disease-related variability. Models for commonly reported cattle diagnoses were estimated separately. Region-specific weekly average prevalence was estimated for each diagnoses and partitioned into normal and abnormal periods. Visualization of state probabilities was used to indicate areas and times of unusual disease prevalence. The analysis suggests that hidden Markov modelling is a useful approach for surveillance datasets from novel populations and/or having little historical baselines

    Time series modeling for syndromic surveillance

    Get PDF
    BACKGROUND: Emergency department (ED) based syndromic surveillance systems identify abnormally high visit rates that may be an early signal of a bioterrorist attack. For example, an anthrax outbreak might first be detectable as an unusual increase in the number of patients reporting to the ED with respiratory symptoms. Reliably identifying these abnormal visit patterns requires a good understanding of the normal patterns of healthcare usage. Unfortunately, systematic methods for determining the expected number of (ED) visits on a particular day have not yet been well established. We present here a generalized methodology for developing models of expected ED visit rates. METHODS: Using time-series methods, we developed robust models of ED utilization for the purpose of defining expected visit rates. The models were based on nearly a decade of historical data at a major metropolitan academic, tertiary care pediatric emergency department. The historical data were fit using trimmed-mean seasonal models, and additional models were fit with autoregressive integrated moving average (ARIMA) residuals to account for recent trends in the data. The detection capabilities of the model were tested with simulated outbreaks. RESULTS: Models were built both for overall visits and for respiratory-related visits, classified according to the chief complaint recorded at the beginning of each visit. The mean absolute percentage error of the ARIMA models was 9.37% for overall visits and 27.54% for respiratory visits. A simple detection system based on the ARIMA model of overall visits was able to detect 7-day-long simulated outbreaks of 30 visits per day with 100% sensitivity and 97% specificity. Sensitivity decreased with outbreak size, dropping to 94% for outbreaks of 20 visits per day, and 57% for 10 visits per day, all while maintaining a 97% benchmark specificity. CONCLUSIONS: Time series methods applied to historical ED utilization data are an important tool for syndromic surveillance. Accurate forecasting of emergency department total utilization as well as the rates of particular syndromes is possible. The multiple models in the system account for both long-term and recent trends, and an integrated alarms strategy combining these two perspectives may provide a more complete picture to public health authorities. The systematic methodology described here can be generalized to other healthcare settings to develop automated surveillance systems capable of detecting anomalies in disease patterns and healthcare utilization

    Modeling emergency department visit patterns for infectious disease complaints: results and application to disease surveillance

    Get PDF
    BACKGROUND: Concern over bio-terrorism has led to recognition that traditional public health surveillance for specific conditions is unlikely to provide timely indication of some disease outbreaks, either naturally occurring or induced by a bioweapon. In non-traditional surveillance, the use of health care resources are monitored in "near real" time for the first signs of an outbreak, such as increases in emergency department (ED) visits for respiratory, gastrointestinal or neurological chief complaints (CC). METHODS: We collected ED CCs from 2/1/94 – 5/31/02 as a training set. A first-order model was developed for each of seven CC categories by accounting for long-term, day-of-week, and seasonal effects. We assessed predictive performance on subsequent data from 6/1/02 – 5/31/03, compared CC counts to predictions and confidence limits, and identified anomalies (simulated and real). RESULTS: Each CC category exhibited significant day-of-week differences. For most categories, counts peaked on Monday. There were seasonal cycles in both respiratory and undifferentiated infection complaints and the season-to-season variability in peak date was summarized using a hierarchical model. For example, the average peak date for respiratory complaints was January 22, with a season-to-season standard deviation of 12 days. This season-to-season variation makes it challenging to predict respiratory CCs so we focused our effort and discussion on prediction performance for this difficult category. Total ED visits increased over the study period by 4%, but respiratory complaints decreased by roughly 20%, illustrating that long-term averages in the data set need not reflect future behavior in data subsets. CONCLUSION: We found that ED CCs provided timely indicators for outbreaks. Our approach led to successful identification of a respiratory outbreak one-to-two weeks in advance of reports from the state-wide sentinel flu surveillance and of a reported increase in positive laboratory test results

    Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection

    Get PDF
    The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way
    • …
    corecore