38 research outputs found

    Veterans engineering resource center: the DREAM project

    Get PDF
    Due to technological advances, data collected from direct healthcare delivery is growing by the day. The constantly growing data that was collected from various resources including patient visits, images, laboratory results and physician notes, though important, has no significance beyond its satisfying reporting and/or documentation requirements and potential application to specific clinical situations, mainly due to the voluminous and heterogeneous nature of the data. With this tremendous amount of data, manual extraction of information is expensive, time consuming, and subject to human error. Fortunately, information technologies have enabled the generation and collection of this data and also the efficient extraction of useful information. Currently, there is a broad spectrum of secondary uses of this clinical data including clinical and translational research, public health and policy analysis, and quality measurement and improvement. The following case study examines a pilot project undertaken by the Veterans Engineering Resource Center(VERC) to design a data mining software utility called Data Resource Engine & Analytical Model (DREAM).This software should be operable within the VA IT infrastructure and will allow providers to view aggregate patient data rapidly and accurately using electronic health records

    THE USE OF ELECTRONIC MEDICAL RECORDS BASED ON A PHYSICIAN DIAGNOSIS OF ASTHMA FOR COUNTY WIDE ASTHMA SURVEILLANCE

    Get PDF
    Allegheny County (AC) has limited information on asthma morbidity. In order to improve upon the sensitivity of asthma, a cross sectional study from January 1, 2002 through December 31, 2005 was conducted to determine whether the data received for emergency room visits from a large regional medical center might be a good predictor for quantifying asthma cases for surveillance. An electronic medical record (EMR) abstract using the Council for State and Territorial Epidemiology (CSTE) Asthma Surveillance case definition of an ICD 9 coded physician diagnosis for primary and secondary asthma (n= 18,284), and primary asthma (n = 5,100) were used to define asthma. The analysis used data from a subset of six hospitals from a large regional medical center covering approximately 60% of adult ED visits in AC that use electronic data for reporting. A secondary analysis of the physician diagnosed primary asthma cases (n= 180) was applied against the CSTE Clinical and Laboratory case definition. Statistical software was used to validate these data abstracted from the EMR. Once these data were validated for accuracy, a fourth dataset of any primary asthma emergency room visits (n= 10,183) were used to test the relationship between asthma morbidity and exposure to ozone. Recent studies have linked asthma hospitalizations in several cities to ozone action days. However, data on the effects of ozone as they relate to asthma emergency room (ER) visits have not been well studied. Electronic medical records from the six hospitals representing the large metropolitan medical center in Allegheny County, PA were obtained on individuals with asthma based on the ICD-9 discharge diagnosis of (493.0-493.9) for the respective time period. Data on ozone, PM2.5, and temperature were obtained for same period. A case crossover methodology using conditional logistic regression as the statistical estimator was conducted to assess the relationship between levels of ozone and PM 2.5 and increases in asthma ER visits. A time stratified sampling strategy was employed assuming a 3:1 case-control ratio.A total of 6,979 individuals were included in the study, with a mean age of 39.25 ±21.0. The mean ozone exposure for this period was 40.6 ppb (range: 0-126). The effect estimates for year-round data was greatest for a 2-day lag adjusted for temperature (OR= 1.02 (95% CI= 1.01-1.04) (p<.05). For each 10-ppb increase in 24-hour maximum ozone, a 2% increase was noted in asthma ER visits. These results indicate that asthma ED visits may be an additional source of information for use in environmental public health tracking

    Knowledge Author: Facilitating user-driven, Domain content development to support clinical information extraction

    Get PDF
    Background: Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text. Results: Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76%) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86%) and varied recall for modifiers (certainty: 91% sidedness: 80%, neurovascular anatomy: 46%). Conclusion: Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts

    Doctor of Philosophy

    Get PDF
    dissertationPublic health surveillance systems are crucial for the timely detection and response to public health threats. Since the terrorist attacks of September 11, 2001, and the release of anthrax in the following month, there has been a heightened interest in public health surveillance. The years immediately following these attacks were met with increased awareness and funding from the federal government which has significantly strengthened the United States surveillance capabilities; however, despite these improvements, there are substantial challenges faced by today's public health surveillance systems. Problems with the current surveillance systems include: a) lack of leveraging unstructured public health data for surveillance purposes; and b) lack of information integration and the ability to leverage resources, applications or other surveillance efforts due to systems being built on a centralized model. This research addresses these problems by focusing on the development and evaluation of new informatics methods to improve the public health surveillance. To address the problems above, we first identified a current public surveillance workflow which is affected by the problems described and has the opportunity for enhancement through current informatics techniques. The 122 Mortality Surveillance for Pneumonia and Influenza was chosen as the primary use case for this dissertation work. The second step involved demonstrating the feasibility of using unstructured public health data, in this case death certificates. For this we created and evaluated a pipeline iv composed of a detection rule and natural language processor, for the coding of death certificates and the identification of pneumonia and influenza cases. The second problem was addressed by presenting the rationale of creating a federated model by leveraging grid technology concepts and tools for the sharing and epidemiological analyses of public health data. As a case study of this approach, a secured virtual organization was created where users are able to access two grid data services, using death certificates from the Utah Department of Health, and two analytical grid services, MetaMap and R. A scientific workflow was created using the published services to replicate the mortality surveillance workflow. To validate these approaches, and provide proofs-of-concepts, a series of real-world scenarios were conducted

    Prev Chronic Dis

    Get PDF
    IntroductionThe advent of universal health care coverage in the United States and the use of electronic health records can make the medical record a disease surveillance tool. The objective of our study was to identify criteria that accurately categorize acute coronary and heart failure events by using electronic health record data exclusively so that the medical record can be used for surveillance without manual record review.MethodsWe serially compared 3 computer algorithms to manual record review. The first 2 algorithms relied on ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification) codes, troponin levels, electrocardiogram (ECG) data, and echocardiograph data. The third algorithm relied on a detailed coding system, Intelligent Medical Objects, Inc., (IMO) interface terminology, troponin levels, and echocardiograph data.ResultsCohen\u2019s \u3ba for the initial algorithm was 0.47 (95% confidence interval [CI], 0.41\u20130.54). Cohen\u2019s \u3ba was 0.61 (95% CI, 0.55\u20130.68) for the second algorithm. Cohen\u2019s \u3ba for the third algorithm was 0.99 (95% CI, 0.98\u20131.00).ConclusionElectronic medical record data are sufficient to categorize coronary heart disease and heart failure events without manual record review. However, only moderate agreement with medical record review can be achieved when the classification is based on 4-digit ICD-9-CM codes because ICD-9-CM 410.9 includes myocardial infarction with elevation of the ST segment on ECG (STEMI) and myocardial infarction without elevation of the ST segment on ECG (nSTEMI). Nearly perfect agreement can be achieved using IMO interface terminology, a more detailed coding system that tracks to ICD9, ICD10 (International Classification of Diseases, Tenth Revision, Clinical Modification), and SnoMED-CT (Systematized Nomenclature of Medicine \u2013 Clinical Terms).20135U50DP000721-04/DP/NCCDPHP CDC HHS/United StatesT32 HL69764/HL/NHLBI NIH HHS/United States23449283PMC3592787719

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena

    The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records

    Get PDF
    Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models.Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models.Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases.Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields

    Machine Learning and Clinical Text. Supporting Health Information Flow

    Get PDF
    Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-­effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.Siirretty Doriast
    corecore