14 research outputs found

    Information Extraction from Electronic Medical Records using Natural Language Processing Techniques

    Get PDF
    Patients share key information about their health with medical practitioners during clinic consultations. These key information may include their past medications and allergies, current situations/issues, and expectations. The healthcare professionals store this information in an Electronic Medical Record (EMR). EMRs have empowered research in healthcare; information hidden in them if harnessed properly through Natural Language Processing (NLP) can be used for disease registries, drug safety, epidemic surveillance, disease prediction, and treatment. This work illustrates the application of NLP techniques to design and implement a Key Information Retrieval System (KIRS framework) using the Latent Dirichlet Allocation algorithm. The cross-industry standard process for data mining methodology was applied in an experiment with an EMR dataset from PubMed todemonstrate the framework. The new system extracted the common problems (ailments) and prescriptions across the five (5) countries presented in the dataset. The system promises to assist health organizations in making informed decisions with the flood of key information data available in their domain. Keywords: Electronic Medical Record, BioNLP, Latent Dirichlet Allocatio

    OvidSP Medline-to-PubMed search filter translation: a methodology for extending search filter range to include PubMed's unique content

    Get PDF
    Background: PubMed translations of OvidSP Medline search filters offer searchers improved ease of access. They may also facilitate access to PubMed’s unique content, including citations for the most recently published biomedical evidence. Retrieving this content requires a search strategy comprising natural language terms (‘textwords’), rather than Medical Subject Headings (MeSH). We describe a reproducible methodology that uses a validated PubMed search filter translation to create a textword-only strategy to extend retrieval to PubMed’s unique heart failure literature. Methods: We translated an OvidSP Medline heart failure search filter for PubMed and established version equivalence in terms of indexed literature retrieval. The PubMed version was then run within PubMed to identify citations retrieved by the filter’s MeSH terms (Heart failure, Left ventricular dysfunction, and Cardiomyopathy). It was then rerun with the same MeSH terms restricted to searching on title and abstract fields (i.e. as ‘textwords’). Citations retrieved by the MeSH search but not the textword search were isolated. Frequency analysis of their titles/ abstracts identified natural language alternatives for those MeSH terms that performed less effectively as textwords. These terms were tested in combination to determine the best performing search string for reclaiming this ‘lost set’. This string, restricted to searching on PubMed’s unique content, was then combined with the validated PubMed translation to extend the filter’s performance in this database. Results: The PubMed heart failure filter retrieved 6829 citations. Of these, 834 (12%) failed to be retrieved when MeSH terms were converted to textwords. Frequency analysis of the 834 citations identified five high frequency natural language alternatives that could improve retrieval of this set (cardiac failure, cardiac resynchronization, left ventricular systolic dysfunction, left ventricular diastolic dysfunction, and LV dysfunction). Together these terms reclaimed 157/834 (18.8%) of lost citations. Conclusions: MeSH terms facilitate precise searching in PubMed’s indexed subset. They may, however, work less effectively as search terms prior to subject indexing. A validated PubMed search filter can be used to develop a supplementary textword-only search strategy to extend retrieval to PubMed’s unique content. A PubMed heart failure search filter is available on the CareSearch website (www.caresearch.com.au) providing access to both indexed and non-indexed heart failure evidence.This study was conducted as part of the work of the CareSearch Project. CareSearch is funded by the Australian Government Department of Health and Ageing

    Hybrid Query Expansion on Ontology Graph in Biomedical Information Retrieval

    Get PDF
    Nowadays, biomedical researchers publish thousands of papers and journals every day. Searching through biomedical literature to keep up with the state of the art is a task of increasing difficulty for many individual researchers. The continuously increasing amount of biomedical text data has resulted in high demands for an efficient and effective biomedical information retrieval (BIR) system. Though many existing information retrieval techniques can be directly applied in BIR, BIR distinguishes itself in the extensive use of biomedical terms and abbreviations which present high ambiguity. First of all, we studied a fundamental yet simpler problem of word semantic similarity. We proposed a novel semantic word similarity algorithm and related tools called Weighted Edge Similarity Tools (WEST). WEST was motivated by our discovery that humans are more sensitive to the semantic difference due to the categorization than that due to the generalization/specification. Unlike most existing methods which model the semantic similarity of words based on either the depth of their Lowest Common Ancestor (LCA) or the traversal distance of between the word pair in WordNet, WEST also considers the joint contribution of the weighted distance between two words and the weighted depth of their LCA in WordNet. Experiments show that weighted edge based word similarity method has achieved 83.5% accuracy to human judgments. Query expansion problem can be viewed as selecting top k words which have the maximum accumulated similarity to a given word set. It has been proved as an effective method in BIR and has been studied for over two decades. However, most of the previous researches focus on only one controlled vocabulary: MeSH. In addition, early studies find that applying ontology won\u27t necessarily improve searching performance. In this dissertation, we propose a novel graph based query expansion approach which is able to take advantage of the global information from multiple controlled vocabularies via building a biomedical ontology graph from selected vocabularies in Metathesaurus. We apply Personalized PageRank algorithm on the ontology graph to rank and identify top terms which are highly relevant to the original user query, yet not presented in that query. Those new terms are reordered by a weighted scheme to prioritize specialized concepts. We multiply a scaling factor to those final selected terms to prevent query drifting and append them to the original query in the search. Experiments show that our approach achieves 17.7% improvement in 11 points average precision and recall value against Lucene\u27s default indexing and searching strategy and by 24.8% better against all the other strategies on average. Furthermore, we observe that expanding with specialized concepts rather than generalized concepts can substantially improve the recall-precision performance. Furthermore, we have successfully applied WEST from the underlying WordNet graph to biomedical ontology graph constructed by multiple controlled vocabularies in Metathesaurus. Experiments indicate that WEST further improve the recall-precision performance. Finally, we have developed a Graph-based Biomedical Search Engine (G-Bean) for retrieving and visualizing information from literature using our proposed query expansion algorithm. G-Bean accepts any medical related user query and processes them with expanded medical query to search for the MEDLINE database

    CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

    Get PDF
    Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE

    Evidence based testing and outcomes in transplantation

    Get PDF
    The use of diagnostic tests is central to the practice of modern medicine, but knowing which test to use, and when, can be problematic. To make evidence-based diagnoses, clinicians need efficient ways of accessing diagnostic studies, interpreting the results of several studies, and checking the applicability of studies to their own setting. The aim of this thesis was to explore solutions to these problems by addressing a specific clinical question; What is the best screening test for latent tuberculosis in patients undergoing transplantation? In a study of diagnostic filter performance in MEDLINE, we found that current ‘specific’ clinical queries limit for diagnosis (used in PubMed and Ovid SP) missed up to 80% of studies in nephrology journals. Other filters (Deville 2000 Broad, Deville 2000 Balanced, Haynes 2004 Balanced, and Vincent 2003 Narrow) had similar specificity to the ‘specific’ clinical queries limit, but identified a greater proportion of the total evidence. Using systematic review methodology, we found that current available data was inadequate to determine whether interferon gamma release assays performed better, worse or the same as the tuberculin skin test for diagnosing tuberculosis in candidates for solid organ transplant. Current international guidelines recommend using either the tuberculin skin test or an interferon gamma release assay, or both in combination. Our findings support these guidelines. We conducted a cross-sectional descriptive of candidates for kidney transplantation and found that despite a high prevalence of risk factors among the group, less than a quarter of candidates were screened for latent tuberculosis before transplant, and only 36% of the 101 patients with risk factors for tuberculosis were tested. This study demonstrates that candidates for kidney transplant are at increased risk of tuberculosis and highlights the need for a nation-wide tuberculosis screening protocol in work-up for transplant

    Concept Based Knowledge Discovery from Biomedical Literature

    Get PDF
    Philosophiae Doctor - PhDThis thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology, resented can be integrated with the researchers own knowledge, experimentation and observations for optimal progression of scientific research.South Afric
    corecore