103 research outputs found

    Evaluation of preprocessing techniques for chief complaint classification

    Get PDF
    OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier

    Evaluation of preprocessing techniques for chief complaint classification

    Get PDF
    OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier

    An automated, broad-based, near real-time public health surveillance system using presentations to hospital Emergency Departments in New South Wales, Australia

    Get PDF
    BACKGROUND: In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). METHODS: Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. RESULTS: During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. CONCLUSION: We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently

    Reliability and validity of EMS dispatch code-based categorization of emergency patients for syndromic surveillance.

    Get PDF
    A retrospective study involving the secondary analysis of public health surveillance records was undertaken to characterize the reliability and validity of an EMS dispatch data-based scheme for assigning emergency patients to surveillance syndromes in relation to two other schemes, one based on hospital ED clinicians\u27 manual categorization according to patients\u27 chief complaint and clinical presentation, and one based on ICD-9 coded hospital ED diagnoses. Comparisons of a sample of individual emergency patients\u27 syndrome assignments according to the EMS versus each of the two hospital categorization schemes were made by matching EMS run records to their corresponding emergency department patient encounter records. This new, linked dataset was analyzed to assess the level of agreement beyond chance between the three possible pairs of syndrome categorization schemes in assigning patients to a respiratory or non-respiratory syndrome and to a gastrointestinal or non-gastrointestinal syndrome. Cohen\u27s kappa statistics were used to measure chance-adjusted agreement between categorization schemes (raters). Z-tests and a chi-square-like test based on the variance of the kappa statistic were used to test the equivalence of kappa coefficients across syndromes, population subgroups and pairs of syndrome assignment schemes. The sensitivity, specificity, predictive value positive and predictive value negative of EMS dispatch and chief complaint-based categorization schemes were also calculated, using the ICD-9-coded ED diagnosis-based categorization scheme as the criterion standard. Comparisons of all performance characteristic (i.e. sensitivity, specificity, predictive value positive and predictive value negative) values were made across categorization schemes and surveillance syndromes to determine whether they were significantly different. The use of EMS dispatch codes for assigning emergency patients to surveillance syndromes was found to have limited but statistically significant reliability in relation to more commonly used syndrome grouping methods based on chief complaints or ICD-9 coded ED diagnoses. The reliability of EMS-based syndrome assignment varied significantly by syndrome, age group and comparison rater. When ICD-9 coded ED diagnosis-based grouping is taken as the criterion standard of syndrome definition, the validity of EMS-based syndrome assignment was limited but comparable to chief complaint-based assignment. The validity of EMS-based syndrome assignment varied significantly by syndrome

    Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine

    Get PDF
    Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input

    Syndrome classification through a retrospective analysis of porcine submissions to a regional animal health laboratory

    No full text
    In response to the global threats of emerging infectious diseases and bioterrorism events, public health surveillance developed analytical methods to cluster early health indicators from multiple data sources into “syndromes” for rapid and efficient disease detection. Syndromic surveillance has become well established in public health, using many different health indicators from multiple sources. In animal health, the timeliness and efficiency of disease detection in early warning surveillance systems has been enhanced by including syndromic surveillance methods. Animal health syndromic surveillance improves disease detection through the analysis of pre-diagnostic data collected for other purposes, from sources such as laboratories, veterinary clinics, abattoirs, farms and pharmacies. However, the data are inherently non-disease specific compared to traditional surveillance and require analyses to ensure that syndromes represent significant diseases as accurately as possible. Syndrome classification is an analytical process that identifies, collates and validates pre-diagnostic indicators within a data source into accurate and viable syndromes. The goals of this thesis were as follows: a) Review surveillance systems and methods to understand the scale, complexity and validity of different syndromic surveillance approaches. b) Describe and evaluate six years of swine laboratory submission data to Veterinary Diagnostic Services (VDS) in the province of Manitoba, Canada, for the purpose of syndromic surveillance. c) Finally, identify and validate the most appropriate syndromes from pre-diagnostic data within the submitted swine cases. An initial systematic review of public health syndromic surveillance was conducted with 81 studies meeting the criteria. The variety and frequency of populations under surveillance, information sources, pre-diagnostic indicators, syndromes and reported values were recorded. The predominant methods for syndrome classification, temporal and spatial analysis and aberration detection were also described. 21,665 swine laboratory submissions from January 2003 to March 2009, including 4726 pathology cases, were evaluated. The frequency and distributions of the predominant pre-diagnostic indicators, test requests and specimen types, were described. The most common pathology diagnoses and organ system involvement were reported for the pathology submissions. For syndrome validation, a Multiple Correspondence Analysis was conducted to cluster multiple pathology diagnoses per case into four diagnostic groups based on organ systems; Respiratory, Multisystemic, Gastrointestinal and “Other”. Syndrome classification was completed, first using agglomerative hierarchical clustering to classify syndromes from 30 test requests and 34 specimen types. For validation, the syndromes were used as predictive variables in a multinomial logistic regression model applied to training and test data sets. The overall model sensitivity, specificity and predictive values for each organ system outcome were estimated. The individual syndromes were compared using relative risk ratios and marginal effects. Five syndromes were identified as having a significantly higher predictive association with one organ system group (compared to the other three): Respiratory, GI, Reproductive, Joint and PCV (specific to porcine circovirus associated disease). The methods in this thesis identified a simplified analytical approach for syndrome classification of laboratory test requests and specimen types within swine submissions. Alternative algorithms for syndrome grouping, establishment of temporal baselines and exploration of automated aberration detection were identified as areas for future research

    Toward unsupervised outbreak detection through visual perception of new patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistical algorithms are routinely used to detect outbreaks of well-defined syndromes, such as influenza-like illness. These methods cannot be applied to the detection of emerging diseases for which no preexisting information is available.</p> <p>This paper presents a method aimed at facilitating the detection of outbreaks, when there is no a priori knowledge of the clinical presentation of cases.</p> <p>Methods</p> <p>The method uses a visual representation of the symptoms and diseases coded during a patient consultation according to the International Classification of Primary Care 2<sup>nd </sup>version (ICPC-2). The surveillance data are transformed into color-coded cells, ranging from white to red, reflecting the increasing frequency of observed signs. They are placed in a graphic reference frame mimicking body anatomy. Simple visual observation of color-change patterns over time, concerning a single code or a combination of codes, enables detection in the setting of interest.</p> <p>Results</p> <p>The method is demonstrated through retrospective analyses of two data sets: description of the patients referred to the hospital by their general practitioners (GPs) participating in the French Sentinel Network and description of patients directly consulting at a hospital emergency department (HED).</p> <p>Informative image color-change alert patterns emerged in both cases: the health consequences of the August 2003 heat wave were visualized with GPs' data (but passed unnoticed with conventional surveillance systems), and the flu epidemics, which are routinely detected by standard statistical techniques, were recognized visually with HED data.</p> <p>Conclusion</p> <p>Using human visual pattern-recognition capacities to detect the onset of unexpected health events implies a convenient image representation of epidemiological surveillance and well-trained "epidemiology watchers". Once these two conditions are met, one could imagine that the epidemiology watchers could signal epidemiological alerts, based on "image walls" presenting the local, regional and/or national surveillance patterns, with specialized field epidemiologists assigned to validate the signals detected.</p

    Syndromic surveillance: reports from a national conference, 2003

    Get PDF
    Overview of Syndromic Surveillance -- What is Syndromic Surveillance? -- Linking Better Surveillance to Better Outcomes -- Review of the 2003 National Syndromic Surveillance Conference - Lessons Learned and Questions To Be Answered -- -- System Descriptions -- New York City Syndromic Surveillance Systems -- Syndrome and Outbreak Detection Using Chief-Complaint Data - Experience of the Real-Time Outbreak and Disease Surveillance Project -- Removing a Barrier to Computer-Based Outbreak and Disease Surveillance - The RODS Open Source Project -- National Retail Data Monitor for Public Health Surveillance -- National Bioterrorism Syndromic Surveillance Demonstration Program -- Daily Emergency Department Surveillance System - Bergen County, New Jersey -- Hospital Admissions Syndromic Surveillance - Connecticut, September 2001-November 2003 -- BioSense - A National Initiative for Early Detection and Quantification of Public Health Emergencies -- Syndromic Surveillance at Hospital Emergency Departments - Southeastern Virginia -- -- Research Methods -- Bivariate Method for Spatio-Temporal Syndromic Surveillance -- Role of Data Aggregation in Biosurveillance Detection Strategies with Applications from ESSENCE -- Scan Statistics for Temporal Surveillance for Biologic Terrorism -- Approaches to Syndromic Surveillance When Data Consist of Small Regional Counts -- Algorithm for Statistical Detection of Peaks - Syndromic Surveillance System for the Athens 2004 Olympic Games -- Taming Variability in Free Text: Application to Health Surveillance -- Comparison of Two Major Emergency Department-Based Free-Text Chief-Complaint Coding Systems -- How Many Illnesses Does One Emergency Department Visit Represent? Using a Population-Based Telephone Survey To Estimate the Syndromic Multiplier -- Comparison of Office Visit and Nurse Advice Hotline Data for Syndromic Surveillance - Baltimore-Washington, D.C., Metropolitan Area, 2002 -- Progress in Understanding and Using Over-the-Counter Pharmaceuticals for Syndromic Surveillance -- -- Evaluation -- Evaluation Challenges for Syndromic Surveillance - Making Incremental Progress -- Measuring Outbreak-Detection Performance By Using Controlled Feature Set Simulations -- Evaluation of Syndromic Surveillance Systems - Design of an Epidemic Simulation Model -- Benchmark Data and Power Calculations for Evaluating Disease Outbreak Detection Methods -- Bio-ALIRT Biosurveillance Detection Algorithm Evaluation -- ESSENCE II and the Framework for Evaluating Syndromic Surveillance Systems -- Conducting Population Behavioral Health Surveillance by Using Automated Diagnostic and Pharmacy Data Systems -- Evaluation of an Electronic General-Practitioner-Based Syndromic Surveillance System -- National Symptom Surveillance Using Calls to a Telephone Health Advice Service - United Kingdom, December 2001-February 2003 -- Field Investigations of Emergency Department Syndromic Surveillance Signals - New York City -- Should We Be Worried? Investigation of Signals Generated by an Electronic Syndromic Surveillance System - Westchester County, New York -- -- Public Health Practice -- Public Health Information Network - Improving Early Detection by Using a Standards-Based Approach to Connecting Public Health and Clinical Medicine -- Information System Architectures for Syndromic Surveillance -- Perspective of an Emergency Physician Group as a Data Provider for Syndromic Surveillance -- SARS Surveillance Project - Internet-Enabled Multiregion Surveillance for Rapidly Emerging Disease -- Health Information Privacy and Syndromic Surveillance SystemsPapers from the second annual National Syndromic Surveillance Conference convened by the New York City Department of Health and Mental Hygiene, the New York Academy of Medicine, and the CDC in New York City during Oct. 23-24, 2003. Published as the September 24, 2004 supplement to vol. 53 of MMWR. Morbidity and mortality weekly report.1571461

    Real-time classifiers from free-text for continuous surveillance of small animal disease

    Get PDF
    A wealth of information of epidemiological importance is held within unstructured narrative clinical records. Text mining provides computational techniques for extracting usable information from the language used to communicate between humans, including the spoken and written word. The aim of this work was to develop text-mining methodologies capable of rendering the large volume of information within veterinary clinical narratives accessible for research and surveillance purposes. The free-text records collated within the dataset of the Small Animal Veterinary Surveillance Network formed the development material and target of this work. The efficacy of pre-existent clinician-assigned coding applied to the dataset was evaluated and the nature of notation and vocabulary used in documenting consultations was explored and described. Consultation records were pre-processed to improve human and software readability, and software was developed to redact incidental identifiers present within the free-text. An automated system able to classify for the presence of clinical signs, utilising only information present within the free-text record, was developed with the aim that it would facilitate timely detection of spatio-temporal trends in clinical signs. Clinician-assigned main reason for visit coding provided a poor summary of the large quantity of information exchanged during a veterinary consultation and the nature of the coding and questionnaire triggering further obfuscated information. Delineation of the previously undocumented veterinary clinical sublanguage identified common themes and their manner of documentation, this was key to the development of programmatic methods. A rule-based classifier using logically-chosen dictionaries, sequential processing and data-masking redacted identifiers while maintaining research usability of records. Highly sensitive and specific free-text classification was achieved by applying classifiers for individual clinical signs within a context-sensitive scaffold, this permitted or prohibited matching dependent on the clinical context in which a clinical sign was documented. The mean sensitivity achieved within an unseen test dataset was 98.17 (74.47, 99.9)% and mean specificity 99.94 (77.1, 100.0)%. When used in combination to identify animals with any of a combination of gastrointestinal clinical signs, the sensitivity achieved was 99.44% (95% CI: 98.57, 99.78)% and specificity 99.74 (95% CI: 99.62, 99.83). This work illustrates the importance, utility and promise of free-text classification of clinical records and provides a framework within which this is possible whilst respecting the confidentiality of client and clinician
    corecore