4 research outputs found

    Using natural language processing to identify opioid use disorder in electronic health record data

    Get PDF
    Background: As opioid prescriptions have risen, there has also been an increase in opioid use disorder (OUD) and its adverse outcomes. Accurate and complete epidemiologic surveillance of OUD, to inform prevention strategies, presents challenges. The objective of this study was to ascertain prevalence of OUD using two methods to identify OUD in electronic health records (EHR): applying natural language processing (NLP) for text mining of unstructured clinical notes and using ICD-10-CM diagnostic codes. Methods: Data were drawn from EHR records for hospital and emergency department patient visits to a large regional academic medical center from 2017 to 2019. International Classification of Disease, 10th Edition, Clinic Modification (ICD-10-CM) discharge codes were extracted for each visit. To develop the rule-based NLP algorithm, a stepwise process was used. First, a small sample of visits from 2017 was used to develop initial dictionaries. Next, EHR corresponding to 30,124 visits from 2018 were used to develop and evaluate the rule-based algorithm. A random sample of the results were manually reviewed to identify and address shortcomings in the algorithm, and to estimate sensitivity and specificity of the two methods of ascertainment. Last, the final algorithm was then applied to 29,212 visits from 2019 to estimate OUD prevalence. Results: While there was substantial overlap in the identified records (n = 1,381 [59.2 %]), overall n = 2,332 unique visits were identified. Of the total unique visits, 430 (18.4 %) were identified only by ICD-10-CM codes, and 521 (22.3 %) were identified only by NLP. The prevalence of visits with evidence of an OUD diagnosis in this sample, ascertained using only ICD-10-CM codes, was 1,811/29,212 (6.1 %). Including the additional 521 visits identified only by NLP, the estimated prevalence of OUD is 2,332/29,212 (7.9 %), an increase of 29.5 % compared to the use of ICD-10-CM codes alone. The estimated sensitivity and specificity of the NLP-based OUD classification were 81.8 % and 97.5 %, respectively, relative to gold-standard manual review by an expert addiction medicine physician. Conclusion: NLP-based algorithms can automate data extraction and identify evidence of opioid use disorder from unstructured electronic healthcare records. The most complete ascertainment of OUD in EHR was combined NLP with ICD-10-CM codes. NLP should be considered for epidemiological studies involving EHR data

    Electronic Health Record (EHR) Data Quality and Type 2 Diabetes Mellitus Care

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Due to frequent utilization, high costs, high prevalence, and negative health outcomes, the care of patients managing type 2 diabetes mellitus (T2DM) remains an important focus for providers, payers, and policymakers. The challenges of care delivery, including care fragmentation, reliance on patient self-management behaviors, adherence to care management plans, and frequent medical visits are well-documented in the literature. T2DM management produces numerous clinical data points in the electronic health record (EHR) including laboratory test values and self-reported behaviors. Recency or absence of these data may limit providers’ ability to make effective treatment decisions for care management. Increasingly, the context in which these data are being generated is changing. Specifically, telehealth usage is increasing. Adoption and use of telehealth for outpatient care is part of a broader trend to provide care at-a-distance, which was further accelerated by the COVID-19 pandemic. Despite unknown implications for patients managing T2DM, providers are increasingly using telehealth tools to complement traditional disease management programs and have adapted documentation practices for virtual care settings. Evidence suggests the quality of data documented during telehealth visits differs from that which is documented during traditional in-person visits. EHR data of differential quality could have cascading negative effects on patient healthcare outcomes. The purpose of this dissertation is to examine whether and to what extent levels of EHR data quality are associated with healthcare outcomes and if EHR data quality is improved by using health information technologies. This dissertation includes three studies: 1) a cross-sectional analysis that quantifies the extent to which EHR data are timely, complete, and uniform among patients managing T2DM with and without a history of telehealth use; 2) a panel analysis to examine associations between primary care laboratory test ages (timeliness) and subsequent inpatient hospitalizations and emergency department admissions; and 3) a panel analysis to examine associations between patient portal use and EHR data timeliness

    Approaches to enhance interpretability and meaningful use of big data in population health practice and research

    Full text link
    While many public health and medical studies use big data, the potential for big data to further population health has yet to be fully realized. Because of the complexities associated with the storage, processing, analysis, and interpretation of these data, few research findings from big data have been translated into practice. Using small area estimation synthetic data and electronic health record (EHR) data, the overall goal of this dissertation research was to characterize health-related exposures with an explicit focus on meaningful data interpretability. In our first aim, we used regression models linked to population microdata to respond to high-priority needs articulated by our community partners in New Bedford, MA. We identified census tracts with an elevated percentage of high-risk subpopulations (e.g., lower rates of exercise, higher rates of diabetes), information our community partners used to prioritize funding opportunities and intervention programs. In our second and third aims, we scrutinized EHR data on children seen at Boston Medical Center (Boston, MA), New England’s largest safety-net hospital, from 2013 through 2017 and uncovered racial/ethnic disparities in asthma severity and residential mobility using logistic regression. We built upon a validated asthma computable phenotype to create a computable phenotype for asthma severity that is based in clinical asthma guidelines. We found that children for whom severity could be ascertained from these EHR data were less likely to be Hispanic and that Black children were less likely to have lung function testing data present. Lastly, we constructed contextualized residential mobility and immobility metrics using EHR address data and the Child Opportunity Index 2.0, identified opportunities and challenges EHR address data present to study this topic, and found significant racial/ethnic disparities in access to neighborhood opportunity. Our findings highlighted the perpetuation of residence in low opportunity areas among non-White children. The main challenge of this dissertation, to work within the limitations inherent to big data to extract meaningful knowledge from these data and by linking to external datasets, turned out to be an opportunity to engage in solutions-oriented research and do work that, to quote Aristotle, “…is greater than the sum of its parts”. Through strategies ranging from engaging with community partners to examining who and what data are captured (and not captured) in EHR health and address data, this dissertation demonstrated potential ways to leverage big data sources to further public health and health equity
    corecore