14 research outputs found

    An automated technique for identifying associations between medications, laboratory results and problems

    Get PDF
    AbstractBackgroundThe patient problem list is an important component of clinical medicine. The problem list enables decision support and quality measurement, and evidence suggests that patients with accurate and complete problem lists may have better outcomes. However, the problem list is often incomplete.ObjectiveTo determine whether association rule mining, a data mining technique, has utility for identifying associations between medications, laboratory results and problems. Such associations may be useful for identifying probable gaps in the problem list.DesignAssociation rule mining was performed on structured electronic health record data for a sample of 100,000 patients receiving care at the Brigham and Women’s Hospital, Boston, MA. The dataset included 272,749 coded problems, 442,658 medications and 11,801,068 laboratory results.MeasurementsCandidate medication-problem and laboratory-problem associations were generated using support, confidence, chi square, interest, and conviction statistics. High-scoring candidate pairs were compared to a gold standard: the Lexi-Comp drug reference database for medications and Mosby’s Diagnostic and Laboratory Test Reference for laboratory results.ResultsWe were able to successfully identify a large number of clinically accurate associations. A high proportion of high-scoring associations were adjudged clinically accurate when evaluated against the gold standard (89.2% for medications with the best-performing statistic, chi square, and 55.6% for laboratory results using interest).ConclusionAssociation rule mining appears to be a useful tool for identifying clinically accurate associations between medications, laboratory results and problems and has several important advantages over alternative knowledge-based approaches

    Improving completeness of electronic problem lists through clinical decision support: a randomized, controlled trial

    Get PDF
    Background: Accurate clinical problem lists are critical for patient care, clinical decision support, population reporting, quality improvement, and research. However, problem lists are often incomplete or out of date. Objective: To determine whether a clinical alerting system, which uses inference rules to notify providers of undocumented problems, improves problem list documentation. Study Design and Methods: Inference rules for 17 conditions were constructed and an electronic health record-based intervention was evaluated to improve problem documentation. A cluster randomized trial was conducted of 11 participating clinics affiliated with a large academic medical center, totaling 28 primary care clinical areas, with 14 receiving the intervention and 14 as controls. The intervention was a clinical alert directed to the provider that suggested adding a problem to the electronic problem list based on inference rules. The primary outcome measure was acceptance of the alert. The number of study problems added in each arm as a pre-specified secondary outcome was also assessed. Data were collected during 6-month pre-intervention (11/2009–5/2010) and intervention (5/2010–11/2010) periods. Results: 17,043 alerts were presented, of which 41.1% were accepted. In the intervention arm, providers documented significantly more study problems (adjusted OR=3.4, p<0.001), with an absolute difference of 6,277 additional problems. In the intervention group, 70.4% of all study problems were added via the problem list alerts. Significant increases in problem notation were observed for 13 of 17 conditions. Conclusion: Problem inference alerts significantly increase notation of important patient problems in primary care, which in turn has the potential to facilitate quality improvement

    Doctor of Philosophy

    Get PDF
    dissertationDisease-specific ontologies, designed to structure and represent the medical knowledge about disease etiology, diagnosis, treatment, and prognosis, are essential for many advanced applications, such as predictive modeling, cohort identification, and clinical decision support. However, manually building disease-specific ontologies is very labor-intensive, especially in the process of knowledge acquisition. On the other hand, medical knowledge has been documented in a variety of biomedical knowledge resources, such as textbook, clinical guidelines, research articles, and clinical data repositories, which offers a great opportunity for an automated knowledge acquisition. In this dissertation, we aim to facilitate the large-scale development of disease-specific ontologies through automated extraction of disease-specific vocabularies from existing biomedical knowledge resources. Three separate studies presented in this dissertation explored both manual and automated vocabulary extraction. The first study addresses the question of whether disease-specific reference vocabularies derived from manual concept acquisition can achieve a near-saturated coverage (or near the greatest possible amount of disease-pertinent concepts) by using a small number of literature sources. Using a general-purpose, manual acquisition approach we developed, this study concludes that a small number of expert-curated biomedical literature resources can prove sufficient for acquiring near-saturated disease-specific vocabularies. The second and third studies introduce automated techniques for extracting disease-specific vocabularies from both MEDLINE citations (title and abstract) and a clinical data repository. In the second study, we developed and assessed a pipeline-based system which extracts disease-specific treatments from PubMed citations. The system has achieved a mean precision of 0.8 for the top 100 extracted treatment concepts. In the third study, we applied classification models to reduce irrelevant disease-concepts associations extracted from MEDLINE citations and electronic medical records. This study suggested the combination of measures of relevance from disparate sources to improve the identification of true-relevant concepts through classification and also demonstrated the generalizability of the studied classification model to new diseases. With the studies, we concluded that existing biomedical knowledge resources are valuable sources for extracting disease-concept associations, from which classification based on statistical measures of relevance could assist a semi-automated generation of disease-specific vocabularies

    Intelligent audit code generation from free text in the context of neurosurgery

    Get PDF
    Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of medicine use their own language and data structures; the translation process has unique challenges, and often requires a fresh approach. This research is devoted to creating a novel semi-automated method for generating codified auditing data from clinical notes recorded in a neurosurgical department in an Australian teaching hospital. The method encapsulates specialist knowledge in rules that instantaneously make precise decisions for the majority of the matches, followed up by dictionary-based matching of the remaining text

    Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

    Get PDF
    Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks

    Semi-supervised incremental learning with few examples for discovering medical association rules

    Get PDF
    Background: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. Methods: In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. Results: The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. Conclusions: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.This work has been partially supported by projects DOTT-HEALTH (PID2019-106942RB-C32, MCI/AEI/FEDER, UE). (Design of the study. Analysis and interpretation of data) and EXTRAE II (IMIENS 2019). (Design of the study. Analysis and interpretation of data. HUF corpus manual tagging. Writing of the manuscript), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) – SmartPITeS” (Data collection and HUF corpus construction), and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the genaration of info-banks and their secondary use in research: technological solution) – CAMAMA 4” (Data collection and HUF corpus construction) from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i.S

    Технологии комплексного интеллектуального анализа клинических данных

    Get PDF
    The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient’s features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented.Background: Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality.Aims: the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center.Materials and methods: Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as «negation» (indicates that the disease is absent), «no patient» (indicates that the disease refers to the patient’s family member, but not to the patient), «severity of illness», «disease course», «body region to which the disease refers». Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the method for determining the most informative patients’ features are also proposed.Results: Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records of patients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases.Conclusions: The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare. Обоснование. Медицинские учреждения генерируют большой поток как структурированных, так и неструктурированных данных, содержащих важную информацию о пациентах. В структурированном виде, как правило, хранятся результаты анализов, однако подавляющее количество данных хранится в неструктурированной форме в виде текстов на естественном языке (анамнезы, результаты осмотров, описания результатов обследований, таких как УЗИ, ЭКГ, рентгеновских исследований и др.). Используя методы интеллектуальной обработки накопленных массивов структурированных и неструктурированных данных, можно автоматизировать решение многих задач, возникающих в клинической практике и повысить качество медицинской помощи.Цель исследования: создание комплексной системы интеллектуальной обработки данных в многопрофильном педиатрическом центре.Методы. Извлечение информации из клинических текстов на русском языке осуществляется на основе полного лингвистического анализа. Извлекаются упоминания заболеваний, симптомов, областей тела, лекарственных препаратов. В тексте также распознаются атрибуты заболеваний: «отрицание» (указывает на то, что заболевание отсутствует), «не пациент» (указывает на то, что заболевание относится не к пациенту, а к его родственнику), «тяжесть заболевания», «течение заболевания», «область тела, к которой относится заболевание». Для извлечения информации используются медицинские тезаурусы, набор вручную составленных шаблонов, а также различные методы на основе машинного обучения. Полученные из текстов данные используются для решения задачи автоматической диагностики хронических заболеваний. Предложен метод на основе машинного обучения для классификации пациентов со схожими нозологиями, а также метод для определения наиболее информативных признаков.Результаты. Экспериментальное исследование разработанных методов проводилось на обезличенных историях болезни пациентов педиатрического центра. Проведена оценка качества разработанных методов извлечения информации из клинических текстов на русском языке. Проведена экспериментальная оценка метода автоматической диагностики на данных пациентов с аллергическими заболеваниями и болезными органов дыхания, нефрологическими и ревматическими заболеваниями. Определены наиболее подходящие методы машинного обучения для классификации пациентов для каждой группы заболеваний, а также наиболее информативные признаки. Использование данных, извлеченных из клинических текстов совместно со структурированными данными, позволило повысить качество диагностики хронических заболеваний по сравнению с использованием лишь доступных структурированных данных. Получены также шаблонные комбинации признаков заболеваний.Заключение. Разработанные методы были реализованы в системе интеллектуальной обработки данных в многопрофильном педиатрическом центре. Проведенные исследования свидетельствуют о перспективности использования системы для повышения качества медицинской помощи пациентам детской возрастной категории

    Data Mining for Identifying Novel Associations and Temporal Relationships with Charcot Foot

    Get PDF

    Identifying risk patterns for suicide attempts in individuals with diabetes : a data-driven approach using LASSO regression

    Get PDF
    Diabetes is a major health concern in the United States, with 34.2 million Americans affected in 2020. Unfortunately, the risk of suicide is also elevated in individuals with diabetes, with around 90,000 people with diabetes committing suicide each year. People with type 1 diabetes are three to four times more likely to attempt suicide, and those with newly diagnosed type 2 diabetes are twice as likely to attempt suicide compared to the general population. However, poor mental health comorbidity is still neglected, and more recommendations are needed to support for people with diabetes. It is widely acknowledged that the comorbidity of depression with diabetes is considered a higher risk factor for suicide attempts Previous studies have used logistic regression to identify risk factors for suicide attempts in individuals with diabetes. However, this technique can be prone to overfitting when the number of variables is high. To address this issue, we used the LASSO (Least Absolute Shrinkage and Selection Operator), a regularization technique, to reduce overfitting in a logistic regression model. It works by adding a penalty term ([lambda]) to the log-likelihood function, which shrinks the estimates of the coefficients. This process allows LASSO to act as a feature selection method, effectively setting coefficients that contribute most to the error to zero. Because few studies have focused on un derstanding the relationship between suicide attempts and diabetes, we used association rule mining ARM an explainable rule based machine learning technique, for knowledge discovery to reveal previously unknown relationships between suicide attempts and diabetes. This approach has already proved useful in the medical field, where it has been applied to electronic health record (EHR) data to discover associations such as disease co-occurrences, drug-disease associations, and symptomatic patterns of disease. However, no previous studies have used ARM to determine risk factors and predict suicide attempts in people with diabetes. The aim of this dissertation is to identify patterns of risk factors for suicide attempts in individuals with diabetes, with the long term goal of developing a clinical decision support system that can be integrated into EHRs. This system would allow healthcare providers to identify patients with diabetes at high risk of suicide attempts and provide appropriate preventive measures during outpatient clinic visits. To achieve this goal, we have three specific aims: (1) to identify potential risk factors for suicide attempts in individuals with diabetes through a literature review; (2) to investigate risk factors for suicide attempts in individuals with diabetes using LASSO regression; (3) to identify risk patterns for suicide attempts in individuals with diabetes using association rule mining. In this dissertation, we have reviewed the literature and compiled a list of data elements for suicide attempts in people with diabetes. We then retrieved data on patients with diabetes from Cerner Real-World Data [trade mark]. LASSO regression was used for feature selection, and ARM was used for investigating the risk patterns. We discovered risk patterns that are understandable and practical for healthcare providers. The findings of this research can inform suicide prevention efforts for people with diabetes and contribute to improved mental health outcomes.Includes bibliographical references
    corecore