9 research outputs found

    Evaluation of Negation and Uncertainty Detection and its Impact on Precision and Recall in Search

    Get PDF
    Radiology reports contain information that can be mined using a search engine for teaching, research, and quality assurance purposes. Current search engines look for exact matches to the search term, but they do not differentiate between reports in which the search term appears in a positive context (i.e., being present) from those in which the search term appears in the context of negation and uncertainty. We describe RadReportMiner, a context-aware search engine, and compare its retrieval performance with a generic search engine, Google Desktop. We created a corpus of 464 radiology reports which described at least one of five findings (appendicitis, hydronephrosis, fracture, optic neuritis, and pneumonia). Each report was classified by a radiologist as positive (finding described to be present) or negative (finding described to be absent or uncertain). The same reports were then classified by RadReportMiner and Google Desktop. RadReportMiner achieved a higher precision (81%), compared with Google Desktop (27%; p < 0.0001). RadReportMiner had a lower recall (72%) compared with Google Desktop (87%; p = 0.006). We conclude that adding negation and uncertainty identification to a word-based radiology report search engine improves the precision of search results over a search engine that does not take this information into account. Our approach may be useful to adopt into current report retrieval systems to help radiologists to more accurately search for radiology reports

    Annotating the human genome with Disease Ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.</p> <p>Results</p> <p>We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.</p> <p>Conclusion</p> <p>The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.</p

    Information Technology and Computer Science

    No full text
    Abstract- The healthcare system is a knowledge driven industry which consists of vast and growing volumes of narrative information obtained from discharge summaries/reports, physicians case notes, pathologists as well as radiologists reports. This information is usually stored in unstructured and non-standardized formats in electronic healthcare systems which make it difficult for the systems to understand the information contents of the narrative information. Thus, the access to valuable and meaningful healthcare information for decision making is a challenge. Nevertheless, Natural Language Processing (NLP) techniques have been used to structure narrative information in healthcare. Thus, NLP techniques have the capability to capture unstructured healthcare information, analyze its grammatical structure, determine the meaning of the information and translate the information so that it can be easily understood by the electronic healthcare systems. Consequently, NLP techniques reduce cost as well as improve the quality of healthcare. It is therefore against this background that this paper reviews the NLP techniques used in healthcare, their applications as well as their limitations

    Unsupervised Biomedical Named Entity Recognition

    Get PDF
    Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain a list of biomedical entities and a large unannotated corpus to build an unsupervised NER system that does not require any manual annotations. The method that we developed in this research has two phases. In the first phase, a biomedical corpus is automatically annotated with some named entities using UMLS through unambiguous exact matching which we call weakly-labeled data. In this data, positive examples are the entities in the text that exactly match in UMLS and have only one semantic type which belongs to the desired entity class to be extracted (for example, diseases and disorders). Negative examples are the entities in the text that exactly match in UMLS but are of semantic types other than those that belong to the desired entity class. These examples are then used to train a machine learning classifier using features that represent the contexts in which they appeared in the text. The trained classifier is applied back to the text to gather more examples iteratively through the process of self-training. The trained classifier is then capable of classifying mentions in an unseen text as of the desired entity class or not from the contexts in which they appear. Although the trained named entity detector is good at detecting the presence of entities of the desired class in text, it cannot determine their correct boundaries. In the second phase of our method, called “Boundary Expansion”, the correct boundaries of the entities are determined. This method is based on a novel idea that utilizes machine learning and UMLS. Training examples for boundary expansion are gathered directly from UMLS and do not require any manual annotations. We also developed a new WordNet based approach for boundary expansion. Our developed method was evaluated on three datasets - SemEval 2014 Task 7 dataset that has diseases and disorders as the desired entity class, GENIA dataset that has proteins, DNAs, RNAs, cell types, and cell lines as the desired entity classes, and i2b2 dataset that has problems, tests, and treatments as the desired entity classes. Our method performed well and obtained performance close to supervised methods on the SemEval dataset. On the other datasets, it outperformed an existing unsupervised method on most entity classes. Availability of a list of entity names with their semantic types and a large unannotated corpus are the only requirements of our method to work well. Given these, our method generalizes across different types of entities and different types of biomedical text. Being unsupervised, the method can be easily applied to new NER tasks without needing costly annotations

    Sähköisen potilaskertomuksen rakenteistaminen : Menetelmät, arviointikäytännöt ja vaikutukset

    Get PDF
    Potilastietoa voidaan hyödyntää moniin eri tarkoituksiin silloin, kun se on tuotettu rakenteistamalla yhtenäisessä muodossa. Raportissa kuvataan, miten rakenteistaminen vaikuttaa hoitotyöhön, kliiniseen potilastyöhön ja potilastiedon toisiokäyttöön ja millä eri tavoilla tieto voidaan rakenteistaa. Raportti perustuu laajaan systemaattiseen kirjallisuuskatsaukseen. Vakioidun termistön käyttö edistää hoitotyön prosesseja ja hoidon jatkuvuutta. Kliinisessä potilastyössä rakenteistamisen vaikutuksia on tutkittu varsin vähän. Rakenteistamisen vaikutuksia hoidon laatuun oli arvioitu yksittäisissä artikkeleissa hoitosuositusten noudattamisen, lääkitysvirheiden vähenemisen, haitallisten lääkeinteraktioiden tai haittatapahtumien seurannan näkökulmista. Potilastiedon toisiokäytön näkökulmasta artikkelit tarkastelivat vaikutuksia kirjaamisen tehokkuuteen, tiedon laatuun, kuten kattavuuteen ja oikeellisuuteen tai arvioivat kirjattua tietoa hyödyntävien tekstilouhintajärjestelmien laatua. Raportti on tuotettu tilanteessa, jossa valtakunnallisia tietojärjestelmäpalveluja ollaan ottamassa käyttöön ja tietojen käyttö laajenee. Tieto erilaisten rakenteistamisen menetelmien vaikutuksista ja käyttömahdollisuuksista tarjoaa perustan kansallisten tietovarantojen ja niiden hyödyntämisen jatkokehitykselle

    Automated code compliance checking in the construction domain using semantic natural language processing and logic-based reasoning

    Get PDF
    Construction projects must comply with various regulations. The manual process of checking the compliance with regulations is costly, time consuming, and error prone. With the advancement in computing technology, there have been many research efforts in automating the compliance checking process, and many software development efforts led by industry bodies/associations, software companies, and/or government organizations to develop automated compliance checking (ACC) systems. However, two main gaps in the existing ACC efforts are: (1) manual effort is needed for extracting requirements from regulatory documents and encoding these requirements in a computer-processable rule format; and (2) there is a lack of a semantic representation for supporting automated compliance reasoning that is non-proprietary, non-hidden, and user-understandable and testable. To address these gaps, this thesis proposes a new ACC method that: (1) utilizes semantic natural language processing (NLP) techniques to automatically extract regulatory information from building codes and design information from building information models (BIMs); and (2) utilizes a semantic logic-based representation to represent and reason about the extracted regulatory information and design information for compliance checking. The proposed method is composed of four main methods/algorithms that are combined in one computational framework: (1) a semantic, rule-based method and algorithm that leverage NLP techniques to automatically extract regulatory information from building codes and represent the extracted information into semantic tuples, (2) a semantic, rule-based method and algorithm that leverage NLP techniques to automatically transform the extracted regulatory information into logic rules to prepare for automated reasoning, (3) a semantic, rule-based information extraction and information transformation method and algorithm to automatically extract design information from BIMs and transform the extracted information into logic facts to prepare for automated reasoning, and (4) a logic-based information representation and compliance reasoning schema to represent regulatory and design information for enabling the automated compliance reasoning process. To test the proposed method, a building information model test case was developed based on the Duplex Apartment Project from buildingSMARTalliance of the National Institute of Building Sciences. The test case was checked for compliance with a randomly selected chapter, Chapter 19, of the International Building Code 2009. Comparing to a manually developed gold standard, 87.6% precision and 98.7% recall in noncompliance detection were achieved, on the testing data
    corecore