255 research outputs found

    Linking social media, medical literature, and clinical notes using deep learning.

    Get PDF
    Researchers analyze data, information, and knowledge through many sources, formats, and methods. The dominant data format includes text and images. In the healthcare industry, professionals generate a large quantity of unstructured data. The complexity of this data and the lack of computational power causes delays in analysis. However, with emerging deep learning algorithms and access to computational powers such as graphics processing unit (GPU) and tensor processing units (TPUs), processing text and images is becoming more accessible. Deep learning algorithms achieve remarkable results in natural language processing (NLP) and computer vision. In this study, we focus on NLP in the healthcare industry and collect data not only from electronic medical records (EMRs) but also medical literature and social media. We propose a framework for linking social media, medical literature, and EMRs clinical notes using deep learning algorithms. Connecting data sources requires defining a link between them, and our key is finding concepts in the medical text. The National Library of Medicine (NLM) introduces a Unified Medical Language System (UMLS) and we use this system as the foundation of our own system. We recognize social media’s dynamic nature and apply supervised and semi-supervised methodologies to generate concepts. Named entity recognition (NER) allows efficient extraction of information, or entities, from medical literature, and we extend the model to process the EMRs’ clinical notes via transfer learning. The results include an integrated, end-to-end, web-based system solution that unifies social media, literature, and clinical notes, and improves access to medical knowledge for the public and experts

    Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research

    Get PDF
    Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain

    Identifying Locations of Violent Injuries in Las Vegas to Implement the Cardiff Violence Prevention Model

    Get PDF
    Public violence in the United States is a major health concern. Incidents involving violent crimes are often not reported to law enforcement. The Cardiff Model is a violence prevention program developed in the UK to identify and enable data sharing of violent injury locations between Emergency Rooms (ER) and local law enforcement to help identify areas for community improvements. The model is now in use in several major cities in the US to reduce violence. Las Vegas has seen an increase in public violence in recent years. As a result, researchers from the Southern Nevada Health District (SNHD) and University of Las Vegas (UNLV) believe the Cardiff Model is a viable solution to address this public health crisis. This research explores natural language processing and machine learning models to extract violent injury location information from ER records in preparation for introducing the Cardiff Violence Prevention Model in Clark County, Nevada

    The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records

    Get PDF
    Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models.Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models.Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases.Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields

    Automating Electronic Health Record Data Quality Assessment

    Get PDF
    Information systems such as Electronic Health Record (EHR) systems are susceptible to data quality (DQ) issues. Given the growing importance of EHR data, there is an increasing demand for strategies and tools to help ensure that available data are fit for use. However, developing reliable data quality assessment (DQA) tools necessary for guiding and evaluating improvement efforts has remained a fundamental challenge. This review examines the state of research on operationalising EHR DQA, mainly automated tooling, and highlights necessary considerations for future implementations. We reviewed 1841 articles from PubMed, Web of Science, and Scopus published between 2011 and 2021. 23 DQA programs deployed in real-world settings to assess EHR data quality (n = 14), and a few experimental prototypes (n = 9), were identified. Many of these programs investigate completeness (n = 15) and value conformance (n = 12) quality dimensions and are backed by knowledge items gathered from domain experts (n = 9), literature reviews and existing DQ measurements (n = 3). A few DQA programs also explore the feasibility of using data-driven techniques to assess EHR data quality automatically. Overall, the automation of EHR DQA is gaining traction, but current efforts are fragmented and not backed by relevant theory. Existing programs also vary in scope, type of data supported, and how measurements are sourced. There is a need to standardise programs for assessing EHR data quality, as current evidence suggests their quality may be unknown

    Preface

    Get PDF

    Health systems data interoperability and implementation

    Get PDF
    Objective The objective of this study was to use machine learning and health standards to address the problem of clinical data interoperability across healthcare institutions. Addressing this problem has the potential to make clinical data comparable, searchable and exchangeable between healthcare providers. Data sources Structured and unstructured data has been used to conduct the experiments in this study. The data was collected from two disparate data sources namely MIMIC-III and NHanes. The MIMIC-III database stored data from two electronic health record systems which are CareVue and MetaVision. The data stored in these systems was not recorded with the same standards; therefore, it was not comparable because some values were conflicting, while one system would store an abbreviation of a clinical concept, the other would store the full concept name and some of the attributes contained missing information. These few issues that have been identified make this form of data a good candidate for this study. From the identified data sources, laboratory, physical examination, vital signs, and behavioural data were used for this study. Methods This research employed a CRISP-DM framework as a guideline for all the stages of data mining. Two sets of classification experiments were conducted, one for the classification of structured data, and the other for unstructured data. For the first experiment, Edit distance, TFIDF and JaroWinkler were used to calculate the similarity weights between two datasets, one coded with the LOINC terminology standard and another not coded. Similar sets of data were classified as matches while dissimilar sets were classified as non-matching. Then soundex indexing method was used to reduce the number of potential comparisons. Thereafter, three classification algorithms were trained and tested, and the performance of each was evaluated through the ROC curve. Alternatively the second experiment was aimed at extracting patient’s smoking status information from a clinical corpus. A sequence-oriented classification algorithm called CRF was used for learning related concepts from the given clinical corpus. Hence, word embedding, random indexing, and word shape features were used for understanding the meaning in the corpus. Results Having optimized all the model’s parameters through the v-fold cross validation on a sampled training set of structured data ( ), out of 24 features, only ( 8) were selected for a classification task. RapidMiner was used to train and test all the classification algorithms. On the final run of classification process, the last contenders were SVM and the decision tree classifier. SVM yielded an accuracy of 92.5% when the and parameters were set to and . These results were obtained after more relevant features were identified, having observed that the classifiers were biased on the initial data. On the other side, unstructured data was annotated via the UIMA Ruta scripting language, then trained through the CRFSuite which comes with the CLAMP toolkit. The CRF classifier obtained an F-measure of 94.8% for “nonsmoker” class, 83.0% for “currentsmoker”, and 65.7% for “pastsmoker”. It was observed that as more relevant data was added, the performance of the classifier improved. The results show that there is a need for the use of FHIR resources for exchanging clinical data between healthcare institutions. FHIR is free, it uses: profiles to extend coding standards; RESTFul API to exchange messages; and JSON, XML and turtle for representing messages. Data could be stored as JSON format on a NoSQL database such as CouchDB, which makes it available for further post extraction exploration. Conclusion This study has provided a method for learning a clinical coding standard by a computer algorithm, then applying that learned standard to unstandardized data so that unstandardized data could be easily exchangeable, comparable and searchable and ultimately achieve data interoperability. Even though this study was applied on a limited scale, in future, the study would explore the standardization of patient’s long-lived data from multiple sources using the SHARPn open-sourced tools and data scaling platformsInformation ScienceM. Sc. (Computing
    • …
    corecore