55 research outputs found

    Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

    Full text link
    From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

    A simulation study comparing aberration detection algorithms for syndromic surveillance

    Get PDF
    BACKGROUND: The usefulness of syndromic surveillance for early outbreak detection depends in part on effective statistical aberration detection. However, few published studies have compared different detection algorithms on identical data. In the largest simulation study conducted to date, we compared the performance of six aberration detection algorithms on simulated outbreaks superimposed on authentic syndromic surveillance data. METHODS: We compared three control-chart-based statistics, two exponential weighted moving averages, and a generalized linear model. We simulated 310 unique outbreak signals, and added these to actual daily counts of four syndromes monitored by Public Health – Seattle and King County's syndromic surveillance system. We compared the sensitivity of the six algorithms at detecting these simulated outbreaks at a fixed alert rate of 0.01. RESULTS: Stratified by baseline or by outbreak distribution, duration, or size, the generalized linear model was more sensitive than the other algorithms and detected 54% (95% CI = 52%–56%) of the simulated epidemics when run at an alert rate of 0.01. However, all of the algorithms had poor sensitivity, particularly for outbreaks that did not begin with a surge of cases. CONCLUSION: When tested on county-level data aggregated across age groups, these algorithms often did not perform well in detecting signals other than large, rapid increases in case counts relative to baseline levels

    A Hidden Markov Model for Analysis of Frontline Veterinary Data for Emerging Zoonotic Disease Surveillance

    Get PDF
    Surveillance systems tracking health patterns in animals have potential for early warning of infectious disease in humans, yet there are many challenges that remain before this can be realized. Specifically, there remains the challenge of detecting early warning signals for diseases that are not known or are not part of routine surveillance for named diseases. This paper reports on the development of a hidden Markov model for analysis of frontline veterinary sentinel surveillance data from Sri Lanka. Field veterinarians collected data on syndromes and diagnoses using mobile phones. A model for submission patterns accounts for both sentinel-related and disease-related variability. Models for commonly reported cattle diagnoses were estimated separately. Region-specific weekly average prevalence was estimated for each diagnoses and partitioned into normal and abnormal periods. Visualization of state probabilities was used to indicate areas and times of unusual disease prevalence. The analysis suggests that hidden Markov modelling is a useful approach for surveillance datasets from novel populations and/or having little historical baselines

    Transmission patterns of smallpox: systematic review of natural outbreaks in Europe and North America since World War II

    Get PDF
    BACKGROUND: Because smallpox (variola major) may be used as a biological weapon, we reviewed outbreaks in post-World War II Europe and North America in order to understand smallpox transmission patterns. METHODS: A systematic review was used to identify papers from the National Library of Medicine, Embase, Biosis, Cochrane Library, Defense Technical Information Center, WorldCat, and reference lists of included publications. Two authors reviewed selected papers for smallpox outbreaks. RESULTS: 51 relevant outbreaks were identified from 1,389 publications. The median for the effective first generation reproduction rate (initial R) was 2 (range 0–38). The majority outbreaks were small (less than 5 cases) and contained within one generation. Outbreaks with few hospitalized patients had low initial R values (median of 1) and were prolonged if not initially recognized (median of 3 generations); outbreaks with mostly hospitalized patients had higher initial R values (median 12) and were shorter (median of 3 generations). Index cases with an atypical presentation of smallpox were less likely to have been diagnosed with smallpox; outbreaks in which the index case was not correctly diagnosed were larger (median of 27.5 cases) and longer (median of 3 generations) compared to outbreaks in which the index case was correctly diagnosed (median of 3 cases and 1 generation). CONCLUSION: Patterns of spread during Smallpox outbreaks varied with circumstances, but early detection and implementation of control measures is a most important influence on the magnitude of outbreaks. The majority of outbreaks studied in Europe and North America were controlled within a few generations if detected early

    Variability in school closure decisions in response to 2009 H1N1: a qualitative systems improvement analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>School closure was employed as a non-pharmaceutical intervention against pandemic 2009 H1N1, particularly during the first wave. More than 700 schools in the United States were closed. However, closure decisions reflected significant variation in rationales, decision triggers, and authority for closure. This variability presents the opportunity for improved efficiency and decision-making.</p> <p>Methods</p> <p>We identified media reports relating to school closure as a response to 2009 H1N1 by monitoring high-profile sources and searching Lexis-Nexis and Google news alerts, and reviewed reports for key themes. News stories were supplemented by observing conference calls and meetings with health department and school officials, and by discussions with decision-makers and community members.</p> <p>Results</p> <p>There was significant variation in the stated goal of closure decision, including limiting community spread of the virus, protecting particularly vulnerable students, and responding to staff shortages or student absenteeism. Because the goal of closure is relevant to its timing, nature, and duration, unclear rationales for closure can challenge its effectiveness. There was also significant variation in the decision-making authority to close schools in different jurisdictions, which, in some instances, was reflected in open disagreement between school and public health officials. Finally, decision-makers did not appear to expect the level of scientific uncertainty encountered early in the pandemic, and they often expressed significant frustration over changing CDC guidance.</p> <p>Conclusions</p> <p>The use of school closure as a public health response to epidemic disease can be improved by ensuring that officials clarify the goals of closure and tailor closure decisions to those goals. Additionally, authority to close schools should be clarified in advance, and decision-makers should expect to encounter uncertainty disease emergencies unfold and plan accordingly.</p

    Stochastic Population Forecasting Based on Combinations of Expert Evaluations Within the Bayesian Paradigm

    Get PDF
    The paper suggests a procedure to derive stochastic population forecasts adopting an expert-based approach. As in a previous work by Billari et al. (2012), experts are required to provide evaluations, in the form of conditional and unconditional scenarios, on summary indicators of the demographic components determining the population evolution, i.e. fertility, mortality and migration. Here two main purposes are pursued. First, the demographic components are allowed to have some kind of dependence. Second, as a result of the existence of a body of shared information, possible correlations among experts are taken into account. In both cases, the dependence structure is not imposed by the researcher but it is indirectly derived through the scenarios elicited from the experts. To address these issues, the method is based on a mixture model, within the so-called Supra-Bayesian approach according to which expert evaluations are treated as data. The derived posterior distribution for the demographic indicators of interest is used as forecasting distribution and a Markov Chain Monte Carlo algorithm is designed to approximate this posterior. The paper provides the questionnaire which was designed by the authors to collect expert opinions. Finally, an application to the forecast of the Italian Population from 2010 up to 2065 is proposed

    Web-based infectious disease surveillance systems and public health perspectives: a systematic review

    Get PDF
    This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.Abstract Background Emerging and re-emerging infectious diseases are a significant public health concern, and early detection and immediate response is crucial for disease control. These challenges have led to the need for new approaches and technologies to reinforce the capacity of traditional surveillance systems for detecting emerging infectious diseases. In the last few years, the availability of novel web-based data sources has contributed substantially to infectious disease surveillance. This study explores the burgeoning field of web-based infectious disease surveillance systems by examining their current status, importance, and potential challenges. Methods A systematic review framework was applied to the search, screening, and analysis of web-based infectious disease surveillance systems. We searched PubMed, Web of Science, and Embase databases to extensively review the English literature published between 2000 and 2015. Eleven surveillance systems were chosen for evaluation according to their high frequency of application. Relevant terms, including newly coined terms, development and classification of the surveillance systems, and various characteristics associated with the systems were studied. Results Based on a detailed and informative review of the 11 web-based infectious disease surveillance systems, it was evident that these systems exhibited clear strengths, as compared to traditional surveillance systems, but with some limitations yet to be overcome. The major strengths of the newly emerging surveillance systems are that they are intuitive, adaptable, low-cost, and operated in real-time, all of which are necessary features of an effective public health tool. The most apparent potential challenges of the web-based systems are those of inaccurate interpretation and prediction of health status, and privacy issues, based on an individuals internet activity. Conclusion Despite being in a nascent stage with further modification needed, web-based surveillance systems have evolved to complement traditional national surveillance systems. This review highlights ways in which the strengths of existing systems can be maintained and weaknesses alleviated to implement optimal web surveillance systems

    Evaluating Syndromic surveillance systems at institutions of higher education (IHEs): A retrospective analysis of the 2009 H1N1 influenza pandemic at two universities

    Get PDF
    BACKGROUND: Syndromic surveillance has been widely adopted as a real-time monitoring tool for timely response to disease outbreaks. During the second wave of the pH1N1 pandemic in Fall 2009, two major universities in Washington, DC collected data that were potentially indicative of influenza-like illness (ILI) cases in students and staff. In this study, our objectives were three-fold. The primary goal of this study was to characterize the impact of pH1N1 on the campuses as clearly as possible given the data available and their likely biases. In addition, we sought to evaluate the strengths and weaknesses of the data series themselves, in order to inform these two universities and other institutions of higher education (IHEs) about real-time surveillance systems that are likely to provide the most utility in future outbreaks (at least to the extent that it is possible to generalize from this analysis). METHODS: We collected a wide variety of data that covered both student ILI cases reported to medical and non-medical staff, employee absenteeism, and hygiene supply distribution records (from University A only). Communication data were retrieved from university broadcasts, university preparedness websites, and H1N1-related on campus media reports. Regional data based on the Centers for Disease Control and Prevention Outpatient Influenza-like Illness Surveillance Network (CDC ILINet) surveillance network, American College Health Association (ACHA) pandemic influenza surveillance data, and local Google Flu Trends were used as external data sets. We employed a "triangulation" approach for data analysis in which multiple contemporary data sources are compared to identify time patterns that are likely to reflect biases as well as those that are more likely to be indicative of actual infection rates. RESULTS: Medical personnel observed an early peak at both universities immediately after school began in early September and a second peak in early November; only the second peak corresponded to patterns in the community at large. Self-reported illness to university deans' offices was also relatively increased during mid-term exam weeks. The overall volume of pH1N1-related communication messages similarly peaked twice, corresponding to the two peaks of student ILI cases. CONCLUSIONS: During the 2009 H1N1 pandemic, both University A and B experienced a peak number of ILI cases at the beginning of the Fall term. This pattern, seen in surveillance systems at these universities and to a lesser extent in data from other IHEs, most likely resulted from students bringing the virus back to campus from their home states coupled with a sudden increase in population density in dormitories and lecture halls. Through comparison of data from different syndromic surveillance data streams, paying attention to the likely biases in each over time, we have determined, at least in the case of the pH1N1 pandemic, that student health center data more accurately depicted disease transmission on campus at both universities during the Fall 2009 pandemic than other available data sources
    corecore