15 research outputs found

    An AI Approach to Transform the E-Patient Records into Clinical Cases for Medical Diagnostic Systems

    Get PDF
    A major problem for researchers in medical informatics is the presence of ‘useful’ Electronic Patient Record (EPR) and its use in providing important information to and about the patient’s diseases and treatment. This paper discusses the importance of medical informatics and its role in health procurement branch. It also proposes an AI approach to transform the E-Patient records into clinical cases for medical diagnostic systems. This approach is based on using medical documents to generate E-Patient records, then mapping these records to build medical cases to form a cases repository that would be utilized in a CBR-medical diagnostic system

    Information Extraction from Electronic Medical Records using Natural Language Processing Techniques

    Get PDF
    Patients share key information about their health with medical practitioners during clinic consultations. These key information may include their past medications and allergies, current situations/issues, and expectations. The healthcare professionals store this information in an Electronic Medical Record (EMR). EMRs have empowered research in healthcare; information hidden in them if harnessed properly through Natural Language Processing (NLP) can be used for disease registries, drug safety, epidemic surveillance, disease prediction, and treatment. This work illustrates the application of NLP techniques to design and implement a Key Information Retrieval System (KIRS framework) using the Latent Dirichlet Allocation algorithm. The cross-industry standard process for data mining methodology was applied in an experiment with an EMR dataset from PubMed todemonstrate the framework. The new system extracted the common problems (ailments) and prescriptions across the five (5) countries presented in the dataset. The system promises to assist health organizations in making informed decisions with the flood of key information data available in their domain. Keywords: Electronic Medical Record, BioNLP, Latent Dirichlet Allocatio

    A Context Free Gramma for Key Noun-Phrase Extraction from Text

    Get PDF
    Topic extraction is a major field in text mining. Key noun-phrases play a very important role in identifying the important document topic because the primary information of a document is described in nounphrases. In this paper, we propose a new topic extraction schema to identify the key noun-phrases by constructing a context free grammar (CFG) from input documents. In our new method, documents are reconstructed as a set of CFG rules using an existing algorithm called Sequitur. The Sequitur algorithm infers the resulting context-free grammatical rules, which can be considered as a hierarchical structure, from a sequence of discrete symbols. The resulting hierarchical structure exposes the underlying structure of input sequence that can help us capture meaningful regularity. Based on this hierarchical structure of the input document, we designed a new algorithm to identify noun-phrases and extract key noun-phrases

    Term Extraction and Disambiguation for Semantic Knowledge Enrichment: A Case Study on Initial Public Offering (IPO)

    Get PDF
    Domain knowledge bases are a basis for advanced knowledge-based systems, manually creating a formal knowledge base for a certain domain is both resource consuming and non-trivial. In this paper, we propose an approach that provides support to extract, select, and disambiguate terms embedded in domain specific documents. The extracted terms are later used to enrich existing ontologies/taxonomies, as well as to bridge domain specific knowledge base with a generic knowledge base such as WordNet. The proposed approach addresses two major issues in the term extraction domain, namely quality and efficiency. Also, the proposed approach adopts a feature-based method that assists in topic extraction and integration with existing ontologies in the given domain. The proposed approach is realized in a research prototype, and then a case study is conducted in order to illustrate the feasibility and the efficiency of the proposed method in the finance domain. A preliminary empirical validation by the domain experts is also conducted to determine the accuracy of the proposed approach. The results from the case study indicate the advantages and potential of the proposed approach

    Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

    Get PDF
    Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper

    Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations

    Get PDF
    BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients\u27 notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care. OBJECTIVE: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients. METHODS: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians\u27 agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems. RESULTS: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen\u27s kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P \u3c .001). Rich learning features contributed to FOCUS\u27s performance substantially. CONCLUSIONS: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care

    CLUSTER ANALYSIS IN BIOTECHNOLOGY

    Full text link

    User-Oriented Summaries Using a PSO Based Scoring Optimization Method

    Get PDF
    Automatic text summarization tools have a great impact on many fields, such as medicine, law, and scientific research in general. As information overload increases, automatic summaries allow handling the growing volume of documents, usually by assigning weights to the extracted phrases based on their significance in the expected summary. Obtaining the main contents of any given document in less time than it would take to do that manually is still an issue of interest. In this article, a new method is presented that allows automatically generating extractive summaries from documents by adequately weighting sentence scoring features using Particle Swarm Optimization. The key feature of the proposed method is the identification of those features that are closest to the criterion used by the individual when summarizing. The proposed method combines a binary representation and a continuous one, using an original variation of the technique developed by the authors of this paper. Our paper shows that using user labeled information in the training set helps to find better metrics and weights. The empirical results yield an improved accuracy compared to previous methods used in this field.Instituto de Investigación en Informátic
    corecore