8,537 research outputs found

    Enhanced Integrated Scoring for Cleaning Dirty Texts

    Full text link
    An increasing number of approaches for ontology engineering from text are gearing towards the use of online sources such as company intranet and the World Wide Web. Despite such rise, not much work can be found in aspects of preprocessing and cleaning dirty texts from online sources. This paper presents an enhancement of an Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). ISSAC is implemented as part of a text preprocessing phase in an ontology engineering system. New evaluations performed on the enhanced ISSAC using 700 chat records reveal an improved accuracy of 98% as compared to 96.5% and 71% based on the use of only basic ISSAC and of Aspell, respectively.Comment: More information is available at http://explorer.csse.uwa.edu.au/reference

    A UMLS-based spell checker for natural language processing in vaccine safety

    Get PDF
    BACKGROUND: The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. METHODS: We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. RESULTS: We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. CONCLUSION: We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest

    An application of hybrid life cycle assessment as a decision support framework for green supply chains

    Get PDF
    In an effort to achieve sustainable operations, green supply chain management has become an important area for firms to concentrate on due to its inherent involvement with all the processes that provide foundations to successful business. Modelling methodologies of product supply chain environmental assessment are usually guided by the principles of life cycle assessment (LCA). However, a review of the extant literature suggests that LCA techniques suffer from a wide range of limitations that prevent a wider application in real-world contexts; hence, they need to be incorporated within decision support frameworks to aid environmental sustainability strategies. Thus, this paper contributes in understanding and overcoming the dichotomy between LCA model development and the emerging practical implementation to inform carbon emissions mitigation strategies within supply chains. Therefore, the paper provides both theoretical insights and a practical application to inform the process of adopting a decision support framework based on a LCA methodology in a real-world scenario. The supply chain of a product from the steel industry is considered to evaluate its environmental impact and carbon ‘hotspots’. The study helps understanding how operational strategies geared towards environmental sustainability can be informed using knowledge and information generated from supply chain environmental assessments, and for highlighting inherent challenges in this process
    corecore