41 research outputs found

    Recursive least squares background prediction of univariate syndromic surveillance data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Surveillance of univariate syndromic data as a means of potential indicator of developing public health conditions has been used extensively. This paper aims to improve the performance of detecting outbreaks by using a background forecasting algorithm based on the adaptive recursive least squares method combined with a novel treatment of the Day of the Week effect.</p> <p>Methods</p> <p>Previous work by the first author has suggested that univariate recursive least squares analysis of syndromic data can be used to characterize the background upon which a prediction and detection component of a biosurvellance system may be built. An adaptive implementation is used to deal with data non-stationarity. In this paper we develop and implement the RLS method for background estimation of univariate data. The distinctly dissimilar distribution of data for different days of the week, however, can affect filter implementations adversely, and so a novel procedure based on linear transformations of the sorted values of the daily counts is introduced. Seven-days ahead daily predicted counts are used as background estimates. A signal injection procedure is used to examine the integrated algorithm's ability to detect synthetic anomalies in real syndromic time series. We compare the method to a baseline CDC forecasting algorithm known as the W2 method.</p> <p>Results</p> <p>We present detection results in the form of Receiver Operating Characteristic curve values for four different injected signal to noise ratios using 16 sets of syndromic data. We find improvements in the false alarm probabilities when compared to the baseline W2 background forecasts.</p> <p>Conclusion</p> <p>The current paper introduces a prediction approach for city-level biosurveillance data streams such as time series of outpatient clinic visits and sales of over-the-counter remedies. This approach uses RLS filters modified by a correction for the weekly patterns often seen in these data series, and a threshold detection algorithm from the residuals of the RLS forecasts. We compare the detection performance of this algorithm to the W2 method recently implemented at CDC. The modified RLS method gives consistently better sensitivity at multiple background alert rates, and we recommend that it should be considered for routine application in bio-surveillance systems.</p

    RANK-BASED TEMPO-SPATIAL CLUSTERING: A FRAMEWORK FOR RAPID OUTBREAK DETECTION USING SINGLE OR MULTIPLE DATA STREAMS

    Get PDF
    In the recent decades, algorithms for disease outbreak detection have become one of the main interests of public health practitioners to identify and localize an outbreak as early as possible in order to warrant further public health response before a pandemic develops. Today’s increased threat of biological warfare and terrorism provide an even stronger impetus to develop methods for outbreak detection based on symptoms as well as definitive laboratory diagnoses. In this dissertation work, I explore the problems of rapid disease outbreak detection using both spatial and temporal information. I develop a framework of non-parameterized algorithms which search for patterns of disease outbreak in spatial sub-regions of the monitored region within a certain period. Compared to the current existing spatial or tempo-spatial algorithm, the algorithms in this framework provide a methodology for fast searching of either univariate data set or multivariate data set. It first measures which study area is more likely to have an outbreak occurring given the baseline data and currently observed data. Then it applies a greedy searching mechanism to look for clusters with high posterior probabilities given the risk measurement for each unit area as heuristic. I also explore the performance of the proposed algorithms. From the perspective of predictive modeling, I adopt a Gamma-Poisson (GP) model to compute the probability of having an outbreak in each cluster when analyzing univariate data. I build a multinomial generalized Dirichlet (MGD) model to identify outbreak clusters from multivariate data which include the OTC data streams collected by the national retail data monitor (NRDM) and the ED data streams collected by the RODS system. Key contributions of this dissertation include 1) it introduces a rank-based tempo-spatial clustering algorithm, RSC, by utilizing greedy searching and Bayesian GP model for disease outbreak detection with comparable detection timeliness, cluster positive prediction value (PPV) and improved running time; 2) it proposes a multivariate extension of RSC (MRSC) which applies MGD model. The evaluation demonstrated the advantage that MGD model can effectively suppress the false alarms caused by elevated signals that are non-disease relevant and occur in all the monitored data streams

    Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection

    Get PDF
    The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way

    Conditional predictive inference for online surveillance of spatial disease incidence

    Get PDF
    This paper deals with the development of statistical methodology for timely detection of incident disease clusters in space and time. The increasing availability of data on both the time and the location of events enables the construction of multivariate surveillance techniques, which may enhance the ability to detect localized clusters of disease relative to the surveillance of the overall count of disease cases across the entire study region. We introduce the surveillance conditional predictive ordinate as a general Bayesian model-based surveillance technique that allows us to detect small areas of increased disease incidence when spatial data are available. To address the problem of multiple comparisons, we incorporate a common probability that each small area signals an alarm when no change in the risk pattern of disease takes place into the analysis. We investigate the performance of the proposed surveillance technique within the framework of Bayesian hierarchical Poisson models using a simulation study. Finally, we present a case study of salmonellosis in South Carolina

    Syndromic surveillance: reports from a national conference, 2003

    Get PDF
    Overview of Syndromic Surveillance -- What is Syndromic Surveillance? -- Linking Better Surveillance to Better Outcomes -- Review of the 2003 National Syndromic Surveillance Conference - Lessons Learned and Questions To Be Answered -- -- System Descriptions -- New York City Syndromic Surveillance Systems -- Syndrome and Outbreak Detection Using Chief-Complaint Data - Experience of the Real-Time Outbreak and Disease Surveillance Project -- Removing a Barrier to Computer-Based Outbreak and Disease Surveillance - The RODS Open Source Project -- National Retail Data Monitor for Public Health Surveillance -- National Bioterrorism Syndromic Surveillance Demonstration Program -- Daily Emergency Department Surveillance System - Bergen County, New Jersey -- Hospital Admissions Syndromic Surveillance - Connecticut, September 2001-November 2003 -- BioSense - A National Initiative for Early Detection and Quantification of Public Health Emergencies -- Syndromic Surveillance at Hospital Emergency Departments - Southeastern Virginia -- -- Research Methods -- Bivariate Method for Spatio-Temporal Syndromic Surveillance -- Role of Data Aggregation in Biosurveillance Detection Strategies with Applications from ESSENCE -- Scan Statistics for Temporal Surveillance for Biologic Terrorism -- Approaches to Syndromic Surveillance When Data Consist of Small Regional Counts -- Algorithm for Statistical Detection of Peaks - Syndromic Surveillance System for the Athens 2004 Olympic Games -- Taming Variability in Free Text: Application to Health Surveillance -- Comparison of Two Major Emergency Department-Based Free-Text Chief-Complaint Coding Systems -- How Many Illnesses Does One Emergency Department Visit Represent? Using a Population-Based Telephone Survey To Estimate the Syndromic Multiplier -- Comparison of Office Visit and Nurse Advice Hotline Data for Syndromic Surveillance - Baltimore-Washington, D.C., Metropolitan Area, 2002 -- Progress in Understanding and Using Over-the-Counter Pharmaceuticals for Syndromic Surveillance -- -- Evaluation -- Evaluation Challenges for Syndromic Surveillance - Making Incremental Progress -- Measuring Outbreak-Detection Performance By Using Controlled Feature Set Simulations -- Evaluation of Syndromic Surveillance Systems - Design of an Epidemic Simulation Model -- Benchmark Data and Power Calculations for Evaluating Disease Outbreak Detection Methods -- Bio-ALIRT Biosurveillance Detection Algorithm Evaluation -- ESSENCE II and the Framework for Evaluating Syndromic Surveillance Systems -- Conducting Population Behavioral Health Surveillance by Using Automated Diagnostic and Pharmacy Data Systems -- Evaluation of an Electronic General-Practitioner-Based Syndromic Surveillance System -- National Symptom Surveillance Using Calls to a Telephone Health Advice Service - United Kingdom, December 2001-February 2003 -- Field Investigations of Emergency Department Syndromic Surveillance Signals - New York City -- Should We Be Worried? Investigation of Signals Generated by an Electronic Syndromic Surveillance System - Westchester County, New York -- -- Public Health Practice -- Public Health Information Network - Improving Early Detection by Using a Standards-Based Approach to Connecting Public Health and Clinical Medicine -- Information System Architectures for Syndromic Surveillance -- Perspective of an Emergency Physician Group as a Data Provider for Syndromic Surveillance -- SARS Surveillance Project - Internet-Enabled Multiregion Surveillance for Rapidly Emerging Disease -- Health Information Privacy and Syndromic Surveillance SystemsPapers from the second annual National Syndromic Surveillance Conference convened by the New York City Department of Health and Mental Hygiene, the New York Academy of Medicine, and the CDC in New York City during Oct. 23-24, 2003. Published as the September 24, 2004 supplement to vol. 53 of MMWR. Morbidity and mortality weekly report.1571461

    Anticipating special events in emergency department forecasting

    Get PDF
    Accurate daily forecast of Emergency Department (ED) attendance helps roster planners in allocating available resources more effectively and potentially influences staffing. Since special events affect human behaviours, they may increase or decrease the demand for ED services. Therefore, it is crucial to model their impact and use them to forecast future attendance to improve roster planning and avoid reactive strategies. In this paper, we propose, for the first time, a forecasting model to generate both point and probabilistic daily forecast of ED attendance. We model the impact of special events on ED attendance by considering real-life ED data. We benchmark the accuracy of our model against three time-series techniques and a regression model that does not consider special events. We show that the proposed model outperforms its benchmarks across all horizons for both point and probabilistic forecasts. Results also show that our model is more robust with an increasing forecasting horizon. Moreover, we provide evidence on how different types of special events may increase or decrease ED attendance. Our model can easily be adapted for use not only by EDs but also by other health services. It could also be generalised to include more types of special events

    Psoriaasi, atoopilise dermatiidi ja ateroskleroosi metaboloomne profileerimine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneMetaboloomika on teadusharu, mis tegeleb madalmolekulaarsete ühendite mõõtmise ja analüüsimisega. Nendeks on aminohapped, biogeensed amiinid, süsivesikud, rasvhapped, nukleiinhapped või peptiidid, mis võivad olla nii eksogeenset kui ka endogeenset päritolu. Nende ainete samaaegne mõõtmine võimaldab näha ainevahetusradade otsest peegeldust, nö. metaboloomset sõrmejälge. Psoriaas on laialt levinud krooniline põletikuline nahahaigus, mis esineb kuni 1%-l lastest ja 2%-3% üldpopulatsioonist. Haiguse teke on seotud mitme põhjusega, sealhulgas geneetiline eelsoodumus ja vastuvõtlikkus, keskkonna mõjutegurid koos immuunsüsteemi düsfunktsiooni ja nahabarjääri häirega. Atoopiline dermatiit on laialt levinud ja kompleksne nahahaigus, mis mõjutab kuni 15% lapsi ja täiskasvanuid üldpopulatsioonis. Kuigi enamik lapsi kasvab haigusest välja, hõlmab see teatud juhtudel ka täiskasvanuid, mõjutades patsientide heaolu ja põhjustades rida kaasuvaid haigusi, sealhulgas allergiad, astma, tähelepanuhäired ning aneemiat. Ateroskleroos on põletikuline haigus, hõlmates arterite seinu, kuhu kogunevad põletikulised rakud ja lipiidid. See viib arterite ahenemiseni, mis võib päädida trombi tekkega, põhjustades infarkti. Ateroskleroosi kõige levinumad vormid on perifeerne arterite haigus ja koronaar-arteri haigus, millest mõlemast on saanud suured rahvatervise probleemid. Käesoleva doktoritöö peamiseks eesmärgiks oli analüüsida psoriaasi, atoopilise dermatiidi ja ateroskleroosi patsientide metaboloomseid profiile ning hinnata sarnasusi ja erinevusi leitud metaboliitides.Metabolomics concerns with the measurement and analysis of small molecule compounds (< 1 kDa, e.g. amino acids, biogenic amines, carbohydrates, fatty acids, nucleic acids, peptides) of both exogenous and endogenous origins. These are the substrates and products of various chemical reactions within metabolic pathways. Psoriasis (PS) is a widespread chronic inflammatory skin disease affecting 2%-3% of the population in the world. The disease is considered to be multifactorial with a number of key contributing factors including genetic predisposition and susceptibility, environmental influences along with immune dysfunction and the disruption of the skin barrier. Atopic dermatitis (AD) is a widespread and complex condition that affects up to 15% adults and children worldwide. Although children have an increased prevalence of atopic dermatitis, many adults remain affected throughout their life. Atherosclerosis is classified as an inflammatory disease that involves the arterial wall and is characterized by the continuous accumulation of inflammatory cells and lipids within the intima of large arteries. The metabolomic profiles of patients with psoriasis and atopic dermatitis were explored to find possible disease-specific metabolites that could be used to characterise and better understand the underlying mechanisms of the disease pathogenesis. The application of the established methods was expanded to peripheral arterial disease and coronary arterial disease to further search for similarities and differences in the metabolomic profiles of the diseaseshttps://www.ester.ee/record=b522842

    Forecasting: theory and practice

    Get PDF
    Forecasting has always been in the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The lack of a free-lunch theorem implies the need for a diverse set of forecasting methods to tackle an array of applications. This unique article provides a non-systematic review of the theory and the practice of forecasting. We offer a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts, including operations, economics, finance, energy, environment, and social good. We do not claim that this review is an exhaustive list of methods and applications. The list was compiled based on the expertise and interests of the authors. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of the forecasting theory and practice

    Modeling count time series following generalized linear models

    Get PDF
    Count time series are found in many different applications, e.g. from medicine, finance or industry, and have received increasing attention in the last two decades. The class of count time series following generalized linear models is very flexible and can describe serial correlation in a parsimonious way. The conditional mean of the observed process is linked to its past values, to past observations and to potential covariate effects. In this thesis we give a comprehensive formulation of this model class. We consider models with the identity and with the logarithmic link function. The conditional distribution can be Poisson or Negative Binomial. An important special case of this class is the so-called INGARCH model and its log-linear extension.A key contribution of this thesis is the R package tscount which provides likelihood-based estimation methods for analysis and modeling of count time series based on generalized linear models. The package includes methods for model fitting and assessment, prediction and intervention analysis. This thesis summarizes the theoretical background of these methods. It gives details on the implementation of the package and provides simulation results for models which have not been studied theoretically before. The usage of the package is illustrated by two data examples. Additionally, we provide a review of R packages which can be used for count time series analysis. A detailed comparison of tscount to those packages demonstrates that tscount is an important contribution which extends and complements existing software. A thematic focus of this thesis is the treatment of all kinds of unusual effects influencing the ordinary pattern of the data. This includes structural changes and different forms of outliers one is faced with in many time series. Our first study on this topic is concerned with retrospective detection of such changes. We analyze different approaches for modeling such intervention effects in count time series based on INGARCH models. Other authors treated a model where an intervention affects the non-observable underlying mean process at the time point of its occurrence and additionally the whole process thereafter via its dynamics. As an alternative, we consider a model where an intervention directly affects the observation at its occurrence, but not the underlying mean, and then also enters the dynamics of the process. While the former definition describes an internal change of the system, the latter can be understood as an external effect on the observations due to e.g. immigration. For our alternative model we develop conditional likelihood estimation and, based on this, develop tests and detection procedures for intervention effects. Both models are compared analytically and using simulated and real data examples. The procedures for our new model work reliably and we find some robustness against misspecification of the intervention model. The aforementioned methods are applied after the complete time series has been observed. In another study we investigate the prospective detection of structural changes, i.e. in real time. For example in public health, surveillance of infectious diseases aims at recognizing outbreaks of epidemics with only short time delays in order to take adequate action promptly. We point out that serial dependence is present in many infectious disease time series. Nevertheless it is still ignored by many procedures used for infectious disease surveillance. Using historical data, we design a prediction-based monitoring procedure for count time series following generalized linear models. We illustrate benefits but also pitfalls of using dependence models for monitoring.Moreover, we briefly review the literature on model selection, robust estimation and robust prediction for count time series. We also make a first study on robust model identification using robust estimators of the (partial) autocorrelation
    corecore