100 research outputs found

    Evaluation of preprocessing techniques for chief complaint classification

    Get PDF
    OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier

    Evaluation of preprocessing techniques for chief complaint classification

    Get PDF
    OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier

    J Biomed Inform

    Get PDF
    Syndromic surveillance detects and monitors individual and population health indicators through sources such as emergency department records. Automated classification of these records can improve outbreak detection speed and diagnosis accuracy. Current syndromic systems rely on hand-coded keyword-based methods to parse written fields and may benefit from the use of modern supervised-learning classifier models. In this paper, we implement two recurrent neural network models based on long short-term memory (LSTM) and gated recurrent unit (GRU) cells and compare them to two traditional bag-of-words classifiers: multinomial na\uefve Bayes (MNB) and a support vector machine (SVM). The MNB classifier is one of only two machine learning algorithms currently being used for syndromic surveillance. All four models are trained to predict diagnostic code groups as defined by Clinical Classification Software, first to predict from discharge diagnosis, and then from chief complaint fields. The classifiers are trained on 3.6 million de-identified emergency department records from a single United States jurisdiction. We compare performance of these models primarily using the F| score, and we measure absolute model performance to determine which conditions are the most amenable to surveillance based on chief complaint alone. Using discharge diagnoses, the LSTM classifier performs best, though all models exhibit an F| score above 96.00. Using chief complaints, the GRU performs best (F|\u202f=\u202f47.38), and MNB with bigrams performs worst (F|\u202f=\u202f39.40). We also note that certain syndrome types are easier to detect than others. For example, chief complaints using the GRU model predicts alcohol-related disorders well (F|\u202f=\u202f78.91) but predicts influenza poorly (F|\u202f=\u202f14.80). In all instances, the RNN models outperformed the bag-of-words classifiers suggesting deep learning models could substantially improve the automatic classification of unstructured text for syndromic surveillance.CC999999/ImCDC/Intramural CDC HHSUnited States

    Toward unsupervised outbreak detection through visual perception of new patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistical algorithms are routinely used to detect outbreaks of well-defined syndromes, such as influenza-like illness. These methods cannot be applied to the detection of emerging diseases for which no preexisting information is available.</p> <p>This paper presents a method aimed at facilitating the detection of outbreaks, when there is no a priori knowledge of the clinical presentation of cases.</p> <p>Methods</p> <p>The method uses a visual representation of the symptoms and diseases coded during a patient consultation according to the International Classification of Primary Care 2<sup>nd </sup>version (ICPC-2). The surveillance data are transformed into color-coded cells, ranging from white to red, reflecting the increasing frequency of observed signs. They are placed in a graphic reference frame mimicking body anatomy. Simple visual observation of color-change patterns over time, concerning a single code or a combination of codes, enables detection in the setting of interest.</p> <p>Results</p> <p>The method is demonstrated through retrospective analyses of two data sets: description of the patients referred to the hospital by their general practitioners (GPs) participating in the French Sentinel Network and description of patients directly consulting at a hospital emergency department (HED).</p> <p>Informative image color-change alert patterns emerged in both cases: the health consequences of the August 2003 heat wave were visualized with GPs' data (but passed unnoticed with conventional surveillance systems), and the flu epidemics, which are routinely detected by standard statistical techniques, were recognized visually with HED data.</p> <p>Conclusion</p> <p>Using human visual pattern-recognition capacities to detect the onset of unexpected health events implies a convenient image representation of epidemiological surveillance and well-trained "epidemiology watchers". Once these two conditions are met, one could imagine that the epidemiology watchers could signal epidemiological alerts, based on "image walls" presenting the local, regional and/or national surveillance patterns, with specialized field epidemiologists assigned to validate the signals detected.</p

    Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine

    Get PDF
    Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input

    A UMLS-based spell checker for natural language processing in vaccine safety

    Get PDF
    BACKGROUND: The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. METHODS: We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. RESULTS: We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. CONCLUSION: We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest

    a practical tool to implement hospital-based syndromic surveillance: SCM

    No full text
    Background: syndromic surveillance has been widely used for the early warning of infectious disease outbreaks, especially in mass gatherings, but the collection of electronic data on symptoms in hospitals is one of the fundamental challenges that must be overcome during operating a syndromic surveillance system. The objective of our study is to describe and evaluate the implementation of a symptom-clicking-module (SCM) as a part of the enhanced hospital-based syndromic surveillance during the 41st World Exposition in Shanghai, China, 2010.Methods: the SCM, including 25 targeted symptoms, was embedded in the sentinels’ Hospital Information Systems (HIS). The clinicians used SCM to record these information of all the visiting patients, and data were collated and transmitted automatically in daily batches. The symptoms were categorized into seven targeted syndromes using pre-defined criteria, and statistical algorithms were applied to detect temporal aberrations in the data series.Results: SCM was deployed successfully in each sentinel hospital and was operated during the 184-day surveillance period. A total of 1,730,797 patient encounters were recorded by SCM, and 6.1 % (105,352 visits) met the criteria of the seven targeted syndromes. Acute respiratory and gastrointestinal syndromes were reported most frequently, accounted for 92.1 % of reports in all syndromes, and the aggregated time-series presented an obvious day-of-week variation over the study period. In total, 191 aberration signals were triggered, and none of them were identified as outbreaks after verification and field investigation.Conclusions: SCM has acted as a practical tool for recording symptoms in the hospital-based enhanced syndromic surveillance system during the 41st World Exposition in Shanghai, in the context of without a preexisting electronic tool to collect syndromic data in the HIS of the sentinel hospitals
    • …
    corecore