232 research outputs found

    Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine

    Get PDF
    Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input

    Classifying Emergency Department Data to Improve Syndromic Surveillance: From Mixed Data Types to ICD Codes and Syndromes

    Get PDF
    Syndromic surveillance systems are used to monitor public health and enable a timely outbreak detection. Emergency department (ED) data can serve as an important data source for syndromic surveillance, but a high amount of missing diagnosis codes can make analyses relying on this information impossible. This study aims at enhancing an ED dataset from a piloted syndromic surveillance system in Germany to enable the monitoring of an influenza-like illness (ILI) syndrome. Routinely collected data from one ED containing mixed-type variables are analysed and two different approaches are implemented to deal with the missing data. Within the first approach, the missing diagnosis codes are imputed by predicting them from the remaining variables, using a multi-class naive Bayes classifier and a deep learning imputation package. In the second approach, a logistic regression model and a binary naive Bayes classifier are used to predict the ILI syndrome from all variables except the diagnosis code. The resulting ILI cases are evaluated on time series level with regard to seasonal patterns. The diagnosis codes were predicted from mixed-type input variables with sufficient precision (34.37% F1-measure in the best model). By taking into account the hierarchical structure of the ICD-10 codes, the performance was improved. Predicting the ILI syndrome independent of the diagnosis code from the remaining variables worked well (39.63% F1-measure in the best model) and the predictions showed medical similarity with the ILI syndrome. The models differed in their sensitivity of including cases, which can be adjusted by changing the threshold of the classifiers. The resulting ILI cases from all models were positively correlated with the reference cases on a time series basis (r = 0.865 for best model) and were comparable with an external data source, a surveillance of severe acute respiratory infections (SARI) (r = 0.867 for best model). The present study showed that the ED dataset can be enhanced to enable the syndromic surveillance of an ILI syndrome based on the diagnosis codes, even if this variable is missing. Additionally, a flexible case definition for an ILI syndrome was developed that is independent of the diagnosis code and the underlying generic method can be applied to other syndromes as well

    Evaluation of preprocessing techniques for chief complaint classification

    Get PDF
    OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier

    Evaluation of preprocessing techniques for chief complaint classification

    Get PDF
    OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier

    An automated, broad-based, near real-time public health surveillance system using presentations to hospital Emergency Departments in New South Wales, Australia

    Get PDF
    BACKGROUND: In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). METHODS: Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. RESULTS: During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. CONCLUSION: We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently

    SYNDROMIC SURVEILLANCE FOR THE EARLY DETECTION OF INFLUENZA OUTBREAKS

    Get PDF
    Syndromic surveillance is a new mechanism utilized to detect naturally occurring and bioterroristic outbreaks. The public health significance is its potential to alert public health to outbreaks earlier and allow a timelier public health response. It involves monitoring data that can be collected in near real-time to find anomalous data. Syndromic surveillance includes school and work absenteeism, over-the-counter drug sales, and hospital admissions data to name a few. This study is an assessment of an extension of the use of syndromic surveillance as an improvement to the traditional method to detect more routine public health problems, specifically, the detection of influenza outbreaks. The assessment involves the prediction of outbreaks in four areas during the period October 15, 2003 to March 31, 2004. The four areas studied included Allegheny County, Pennsylvania, Jefferson County, Kentucky, Los Angeles County, California, and Salt Lake County, Utah. Two aspects of community activity were used as the method for syndromic surveillance, over-the-counter pharmaceutical sales and hospital chief complaints. The over-the-counter sales encompassed a panel of six items including anti-diarrheal medication, anti-fever adult medication, anti-fever pediatric medication, cough and cold products, electrolytes, and thermometers. Additionally, two of the seven hospital chief complaints used in the RODS open source paradigm were monitored. These were constitutional and respiratory chief complaints. Application of standard statistical algorithms showed that the system was able to identify unusual activity several weeks prior to the time when the local health departments were able to identify an outbreak using the standard methods. The largest improvement in detection using syndromic surveillance occurred in Los Angeles where the outbreak was detected 52 days before the Centers for Disease Control had declared widespread activity for the state. In each county over-the-counter sales detected the outbreak sooner then hospital chief complaints, but the hospital chief complaints detect the outbreaks consistently across the various algorithms. More conclusive evidence regarding the possible improvement in outbreak detection with syndromic surveillance can be obtained once a longer time frame has passed to allow more historical data to accumulate. Conducting additional studies on influenza outbreaks in other jurisdictions would also be useful assessments

    Syndromic surveillance: reports from a national conference, 2003

    Get PDF
    Overview of Syndromic Surveillance -- What is Syndromic Surveillance? -- Linking Better Surveillance to Better Outcomes -- Review of the 2003 National Syndromic Surveillance Conference - Lessons Learned and Questions To Be Answered -- -- System Descriptions -- New York City Syndromic Surveillance Systems -- Syndrome and Outbreak Detection Using Chief-Complaint Data - Experience of the Real-Time Outbreak and Disease Surveillance Project -- Removing a Barrier to Computer-Based Outbreak and Disease Surveillance - The RODS Open Source Project -- National Retail Data Monitor for Public Health Surveillance -- National Bioterrorism Syndromic Surveillance Demonstration Program -- Daily Emergency Department Surveillance System - Bergen County, New Jersey -- Hospital Admissions Syndromic Surveillance - Connecticut, September 2001-November 2003 -- BioSense - A National Initiative for Early Detection and Quantification of Public Health Emergencies -- Syndromic Surveillance at Hospital Emergency Departments - Southeastern Virginia -- -- Research Methods -- Bivariate Method for Spatio-Temporal Syndromic Surveillance -- Role of Data Aggregation in Biosurveillance Detection Strategies with Applications from ESSENCE -- Scan Statistics for Temporal Surveillance for Biologic Terrorism -- Approaches to Syndromic Surveillance When Data Consist of Small Regional Counts -- Algorithm for Statistical Detection of Peaks - Syndromic Surveillance System for the Athens 2004 Olympic Games -- Taming Variability in Free Text: Application to Health Surveillance -- Comparison of Two Major Emergency Department-Based Free-Text Chief-Complaint Coding Systems -- How Many Illnesses Does One Emergency Department Visit Represent? Using a Population-Based Telephone Survey To Estimate the Syndromic Multiplier -- Comparison of Office Visit and Nurse Advice Hotline Data for Syndromic Surveillance - Baltimore-Washington, D.C., Metropolitan Area, 2002 -- Progress in Understanding and Using Over-the-Counter Pharmaceuticals for Syndromic Surveillance -- -- Evaluation -- Evaluation Challenges for Syndromic Surveillance - Making Incremental Progress -- Measuring Outbreak-Detection Performance By Using Controlled Feature Set Simulations -- Evaluation of Syndromic Surveillance Systems - Design of an Epidemic Simulation Model -- Benchmark Data and Power Calculations for Evaluating Disease Outbreak Detection Methods -- Bio-ALIRT Biosurveillance Detection Algorithm Evaluation -- ESSENCE II and the Framework for Evaluating Syndromic Surveillance Systems -- Conducting Population Behavioral Health Surveillance by Using Automated Diagnostic and Pharmacy Data Systems -- Evaluation of an Electronic General-Practitioner-Based Syndromic Surveillance System -- National Symptom Surveillance Using Calls to a Telephone Health Advice Service - United Kingdom, December 2001-February 2003 -- Field Investigations of Emergency Department Syndromic Surveillance Signals - New York City -- Should We Be Worried? Investigation of Signals Generated by an Electronic Syndromic Surveillance System - Westchester County, New York -- -- Public Health Practice -- Public Health Information Network - Improving Early Detection by Using a Standards-Based Approach to Connecting Public Health and Clinical Medicine -- Information System Architectures for Syndromic Surveillance -- Perspective of an Emergency Physician Group as a Data Provider for Syndromic Surveillance -- SARS Surveillance Project - Internet-Enabled Multiregion Surveillance for Rapidly Emerging Disease -- Health Information Privacy and Syndromic Surveillance SystemsPapers from the second annual National Syndromic Surveillance Conference convened by the New York City Department of Health and Mental Hygiene, the New York Academy of Medicine, and the CDC in New York City during Oct. 23-24, 2003. Published as the September 24, 2004 supplement to vol. 53 of MMWR. Morbidity and mortality weekly report.1571461

    Enhancing Drug Overdose Mortality Surveillance through Natural Language Processing and Machine Learning

    Get PDF
    Epidemiological surveillance is key to monitoring and assessing the health of populations. Drug overdose surveillance has become an increasingly important part of public health practice as overdose morbidity and mortality has increased due in large part to the opioid crisis. Monitoring drug overdose mortality relies on death certificate data, which has several limitations including timeliness and the coding structure used to identify specific substances that caused death. These limitations stem from the need to analyze the free-text cause-of-death sections of the death certificate that are completed by the medical certifier during death investigation. Other fields, including clinical sciences, have utilized natural language processing (NLP) methods to gain insight from free-text data, but thus far, adoption of NLP methods in epidemiological surveillance has been limited. Through a narrative review of NLP methods currently used in public health surveillance and the integration of two NLP tasks, classification and named entity recognition, this dissertation enhances the capabilities of public health practitioners and researchers to perform drug overdose mortality surveillance. This dissertation advances both surveillance science and public health practice by integrating methods from bioinformatics into the surveillance pipeline which provides more timely and increased quality overdose mortality surveillance, which is essential to guiding effective public health response to the continuing drug overdose epidemic
    • …
    corecore