Search CORE

2 research outputs found

Importance of Measuring Sentential Semantic Knowledge Base of a "Free Text" Medical Corpus

Author: Chatterjee Lopamudra
Publication venue
Publication date: 04/11/2009
Field of study

At present, the healthcare industry uses codified data mainly for billing purpose. Codified data could be used to improve patient care through decision support and analytical systems. However to reduce medical errors, these systems need access to a wide range of medical data. Unfortunately, a great deal of data is only available in a narrative or free text form, requiring natural language processing (NLP) techniques for their codification. Structuring narrative data and analyzing their underlying meaning from a medical domain requires extensive knowledge acquired through studying the domain empirically. Existing NLP system like MedLEE has a limited ability to analyze free text medical observations and codify data against Unified Medical Language System (UMLS) codes. MedLEE was successful in extracting meaning from relatively simple sentences from radiological reports, but could not analyze more complicated sentences which appear frequently in medical reports. An important problem in medical NLP is, understanding how many codes or symbols are necessary to codify a medical domain completely. Another problem is determining whether existing medical lexicons like SNOMED-CT and ICD-9, etc. are suitable for representing the knowledge in medical reports unambiguously. This thesis investigates the problems behind current NLP systems and lexicons, and attempts to estimate the number of required symbols or codes to represent a large corpus of radiology reports. The knowledge will provide a greater understanding of how many symbols may be needed for the complete representation of concepts in other medical domains

IUPUIScholarWorks