8 research outputs found

    A UMLS-based spell checker for natural language processing in vaccine safety

    Get PDF
    BACKGROUND: The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. METHODS: We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. RESULTS: We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. CONCLUSION: We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest

    Web-based infectious disease surveillance systems and public health perspectives: a systematic review

    Get PDF
    This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.Abstract Background Emerging and re-emerging infectious diseases are a significant public health concern, and early detection and immediate response is crucial for disease control. These challenges have led to the need for new approaches and technologies to reinforce the capacity of traditional surveillance systems for detecting emerging infectious diseases. In the last few years, the availability of novel web-based data sources has contributed substantially to infectious disease surveillance. This study explores the burgeoning field of web-based infectious disease surveillance systems by examining their current status, importance, and potential challenges. Methods A systematic review framework was applied to the search, screening, and analysis of web-based infectious disease surveillance systems. We searched PubMed, Web of Science, and Embase databases to extensively review the English literature published between 2000 and 2015. Eleven surveillance systems were chosen for evaluation according to their high frequency of application. Relevant terms, including newly coined terms, development and classification of the surveillance systems, and various characteristics associated with the systems were studied. Results Based on a detailed and informative review of the 11 web-based infectious disease surveillance systems, it was evident that these systems exhibited clear strengths, as compared to traditional surveillance systems, but with some limitations yet to be overcome. The major strengths of the newly emerging surveillance systems are that they are intuitive, adaptable, low-cost, and operated in real-time, all of which are necessary features of an effective public health tool. The most apparent potential challenges of the web-based systems are those of inaccurate interpretation and prediction of health status, and privacy issues, based on an individuals internet activity. Conclusion Despite being in a nascent stage with further modification needed, web-based surveillance systems have evolved to complement traditional national surveillance systems. This review highlights ways in which the strengths of existing systems can be maintained and weaknesses alleviated to implement optimal web surveillance systems

    Molecular Detection and Genotyping of Noroviruses

    Full text link
    Noroviruses (NoVs) are a major cause of gastroenteritis worldwide in humans and animals and are known as very infectious viral agents. They are spread through feces and vomit via several transmission routes involving person-to-person contact, food, and water. Investigation of these transmission routes requires sensitive methods for detection of NoVs. As NoVs cannot be cultivated to date, detection of these viruses relies on the use of molecular methods such as (real-time) reverse transcriptase polymerase chain reaction (RT-PCR). Regardless of the matrix, detection of NoVs generally requires three subsequent steps: a virus extraction step, RNA purification, and molecular detection of the purified RNA, occasionally followed by molecular genotyping. The current review mainly focused on the molecular detection and genotyping of NoVs. The most conserved region in the genome of human infective NoVs is the ORF1/ORF2 junction and has been used as a preferred target region for molecular detection of NoVs by methods such as (real-time) RT-PCR, NASBA, and LAMP. In case of animal NoVs, broad range molecular assays have most frequently been applied for molecular detection. Regarding genotyping of NoVs, five regions situated in the polymerase and capsid genes have been used for conventional RT-PCR amplification and sequencing. As the expected levels of NoVs on food and in water are very low and inhibition of molecular methods can occur in these matrices, quality control including adequate positive and negative controls is an essential part of NoV detection. Although the development of molecular methods for NoV detection has certainly aided in the understanding of NoV transmission, it has also led to new problems such as the question whether low levels of human NoV detected on fresh produce and shellfish could pose a threat to public health. © 2012 Springer Science+Business Media New York
    corecore