49 research outputs found

    An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records

    Get PDF
    AbstractWe describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., “hypertensive disease” or “appendicitis”) are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically. In the second stage, the results from ICD-9 section analyses are combined into a general morbidity dissimilarity index (MDI). For illustration, we examine nine cohorts of patients representing six phenotypes (or controls) derived from five institutions, each a participant in the electronic MEdical REcords and GEnomics (eMERGE) network. The phenotypes studied include type II diabetes and type II diabetes controls, peripheral arterial disease and peripheral arterial disease controls, normal cardiac conduction as measured by electrocardiography, and senile cataracts

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Sähköisen potilaskertomuksen rakenteistaminen : Menetelmät, arviointikäytännöt ja vaikutukset

    Get PDF
    Potilastietoa voidaan hyödyntää moniin eri tarkoituksiin silloin, kun se on tuotettu rakenteistamalla yhtenäisessä muodossa. Raportissa kuvataan, miten rakenteistaminen vaikuttaa hoitotyöhön, kliiniseen potilastyöhön ja potilastiedon toisiokäyttöön ja millä eri tavoilla tieto voidaan rakenteistaa. Raportti perustuu laajaan systemaattiseen kirjallisuuskatsaukseen. Vakioidun termistön käyttö edistää hoitotyön prosesseja ja hoidon jatkuvuutta. Kliinisessä potilastyössä rakenteistamisen vaikutuksia on tutkittu varsin vähän. Rakenteistamisen vaikutuksia hoidon laatuun oli arvioitu yksittäisissä artikkeleissa hoitosuositusten noudattamisen, lääkitysvirheiden vähenemisen, haitallisten lääkeinteraktioiden tai haittatapahtumien seurannan näkökulmista. Potilastiedon toisiokäytön näkökulmasta artikkelit tarkastelivat vaikutuksia kirjaamisen tehokkuuteen, tiedon laatuun, kuten kattavuuteen ja oikeellisuuteen tai arvioivat kirjattua tietoa hyödyntävien tekstilouhintajärjestelmien laatua. Raportti on tuotettu tilanteessa, jossa valtakunnallisia tietojärjestelmäpalveluja ollaan ottamassa käyttöön ja tietojen käyttö laajenee. Tieto erilaisten rakenteistamisen menetelmien vaikutuksista ja käyttömahdollisuuksista tarjoaa perustan kansallisten tietovarantojen ja niiden hyödyntämisen jatkokehitykselle

    Deep Risk Prediction and Embedding of Patient Data: Application to Acute Gastrointestinal Bleeding

    Get PDF
    Acute gastrointestinal bleeding is a common and costly condition, accounting for over 2.2 million hospital days and 19.2 billion dollars of medical charges annually. Risk stratification is a critical part of initial assessment of patients with acute gastrointestinal bleeding. Although all national and international guidelines recommend the use of risk-assessment scoring systems, they are not commonly used in practice, have sub-optimal performance, may be applied incorrectly, and are not easily updated. With the advent of widespread electronic health record adoption, longitudinal clinical data captured during the clinical encounter is now available. However, this data is often noisy, sparse, and heterogeneous. Unsupervised machine learning algorithms may be able to identify structure within electronic health record data while accounting for key issues with the data generation process: measurements missing-not-at-random and information captured in unstructured clinical note text. Deep learning tools can create electronic health record-based models that perform better than clinical risk scores for gastrointestinal bleeding and are well-suited for learning from new data. Furthermore, these models can be used to predict risk trajectories over time, leveraging the longitudinal nature of the electronic health record. The foundation of creating relevant tools is the definition of a relevant outcome measure; in acute gastrointestinal bleeding, a composite outcome of red blood cell transfusion, hemostatic intervention, and all-cause 30-day mortality is a relevant, actionable outcome that reflects the need for hospital-based intervention. However, epidemiological trends may affect the relevance and effectiveness of the outcome measure when applied across multiple settings and patient populations. Understanding the trends in practice, potential areas of disparities, and value proposition for using risk stratification in patients presenting to the Emergency Department with acute gastrointestinal bleeding is important in understanding how to best implement a robust, generalizable risk stratification tool. Key findings include a decrease in the rate of red blood cell transfusion since 2014 and disparities in access to upper endoscopy for patients with upper gastrointestinal bleeding by race/ethnicity across urban and rural hospitals. Projected accumulated savings of consistent implementation of risk stratification tools for upper gastrointestinal bleeding total approximately $1 billion 5 years after implementation. Most current risk scores were designed for use based on the location of the bleeding source: upper or lower gastrointestinal tract. However, the location of the bleeding source is not always clear at presentation. I develop and validate electronic health record based deep learning and machine learning tools for patients presenting with symptoms of acute gastrointestinal bleeding (e.g., hematemesis, melena, hematochezia), which is more relevant and useful in clinical practice. I show that they outperform leading clinical risk scores for upper and lower gastrointestinal bleeding, the Glasgow Blatchford Score and the Oakland score. While the best performing gradient boosted decision tree model has equivalent overall performance to the fully connected feedforward neural network model, at the very low risk threshold of 99% sensitivity the deep learning model identifies more very low risk patients. Using another deep learning model that can model longitudinal risk, the long-short-term memory recurrent neural network, need for transfusion of red blood cells can be predicted at every 4-hour interval in the first 24 hours of intensive care unit stay for high risk patients with acute gastrointestinal bleeding. Finally, for implementation it is important to find patients with symptoms of acute gastrointestinal bleeding in real time and characterize patients by risk using available data in the electronic health record. A decision rule-based electronic health record phenotype has equivalent performance as measured by positive predictive value compared to deep learning and natural language processing-based models, and after live implementation appears to have increased the use of the Acute Gastrointestinal Bleeding Clinical Care pathway. Patients with acute gastrointestinal bleeding but with other groups of disease concepts can be differentiated by directly mapping unstructured clinical text to a common ontology and treating the vector of concepts as signals on a knowledge graph; these patients can be differentiated using unbalanced diffusion earth mover’s distances on the graph. For electronic health record data with data missing not at random, MURAL, an unsupervised random forest-based method, handles data with missing values and generates visualizations that characterize patients with gastrointestinal bleeding. This thesis forms a basis for understanding the potential for machine learning and deep learning tools to characterize risk for patients with acute gastrointestinal bleeding. In the future, these tools may be critical in implementing integrated risk assessment to keep low risk patients out of the hospital and guide resuscitation and timely endoscopic procedures for patients at higher risk for clinical decompensation

    Enhancing drug safety through active surveillance of observational healthcare data

    Get PDF
    Drug safety continues to be a major public health concern in the United States, with adverse drug reactions ranking as the 4th to 6th leading cause of death, and resulting in health care costs of $3.6 billion annually. Recent media attention and public scrutiny of high-profile drug safety issues have increased visibility and skepticism of the effectiveness of the current post-approval safety surveillance processes. Current proposals suggest establishing a national active drug safety surveillance system that leverages observational data, including administrative claims and electronic health records, to monitor and evaluate potential safety issues of medicines. However, the development and evaluation of appropriate strategies for systematic analysis of observational data have not yet been studied. This study introduces a novel exploratory analysis approach (Comparator-Adjusted Safety Surveillance or COMPASS) to identify drug-related adverse events in automated healthcare data. The aims of the study were: 1) to characterize the performance of COMPASS in identifying known safety issues associated with ACE inhibitor exposure within an administrative claims database; 2) to evaluate consistency of COMPASS estimates across a network of disparate databases; and 3) to explore differential effects across ingredients within ACE inhibitor class. COMPASS was observed to have improved accuracy to three other methods under consideration for an active surveillance system: observational screening, disproportionality analysis, and self-controlled case series. COMPASS performance was consistently strong within 5 different databases, though important differences in outcome estimates across the sources highlighted the substantial heterogeneity which makes pooling estimates challenging. The comparative safety analysis of products within the ACE inhibitor class provided evidence of similar risk profiles across an array of different outcomes, and raised questions about the product labeling differences and how observational studies should complement existing evidence as part of a broader safety assessment strategy. The results of this study should inform decisions about the appropriateness and utility of analyzing observational data as part of an active drug safety surveillance process. An improved surveillance system would enable a more comprehensive and timelier understanding of the safety of medicines. Such information supports patients and providers in therapeutic decision-making to minimize risks and improve the quality of care

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
    corecore