17,706 research outputs found

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    Performance Measures Using Electronic Health Records: Five Case Studies

    Get PDF
    Presents the experiences of five provider organizations in developing, testing, and implementing four types of electronic quality-of-care indicators based on EHR data. Discusses challenges, and compares results with those from traditional indicators

    Repeatable and reusable research - Exploring the needs of users for a Data Portal for Disease Phenotyping

    Get PDF
    Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it hard to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This thesis aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, for both new and existing data portals for phenotypes (concept libraries). Methods: Exploratory sequential mixed methods were used in this thesis to look at which concept libraries are available, how they are used, what their characteristics are, where there are gaps, and what needs to be done in the future from the point of view of the people who use them. This thesis consists of three phases: 1) two qualitative studies, including one-to-one interviews with researchers, clinicians, machine learning experts, and senior research managers in health data science, as well as focus group discussions with researchers working with the Secured Anonymized Information Linkage databank, 2) the creation of an email survey (i.e., the Concept Library Usability Scale), and 3) a quantitative study with researchers, health professionals, and clinicians. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would encourage them to: 1) share their work, such as receiving citations from other researchers; and 2) reuse the work of others, such as saving a lot of time and effort, which they frequently spend on creating new code lists from scratch. They also pointed out several barriers that could inhibit them from: 1) sharing their work, such as concerns about intellectual property (e.g., if they shared their methods before publication, other researchers would use them as their own); and 2) reusing others' work, such as a lack of confidence in the quality and validity of their code lists. Participants suggested some developments that they would like to see happen in order to make research that is done with routine data more reproducible, such as the availability of a drive for more transparency in research methods documentation, such as publishing complete phenotype definitions and clear code lists. Conclusions: The findings of this thesis indicated that most participants valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform such as the CALIBER research platform. Analysis of interviews, focus group discussions, and qualitative studies revealed that different users have different requirements, facilitators, barriers, and concerns about concept libraries. This work was to investigate if we should develop concept libraries in Kuwait to facilitate the development of improved data sharing. However, at the end of this thesis the recommendation is this would be unlikely to be cost effective or highly valued by users and investment in open access research publications may be of more value to the Kuwait research/academic community

    Application of Probabilistic Linkage: Compare Health Care Costs among Menopausal Women with Different Symptoms by Linking Women's Registry & Claims Data

    Get PDF
    Abstract: Objectives: Menopause symptoms are a good disease severity proxy for menopausal women, but are not available in claims database. We applied probabilistic linkage to add symptoms recorded in a registry database to claims data, and compare the healthcare costs among women with various symptoms. Methods: Women age 45 or older who used estrogen only hormone therapy (HT) were selected from a large U.S. claims database (04/01/2005-09/30/2008). Another group who used estrogen only HT with a menopause diagnosis was selected from the University of Michigan Women's Registry Database. Logistic regression was used to calculate the propensity score for each patient controlling for osteoporosis, gynecological disorders/procedures, genital infection, gynecology system cancer, breast condition, gut condition, hormone disorder, nerve problem, and other individual comorbidities such as rheumatoid disease, depression, and blood clotting. Patients with the closest propensity score from each group were matched, and menopause symptoms for registry patients were added to the claims database records. After repeating probabilistic linkage 250 times, the mean and 95% confidence interval (CI) of healthcare costs during the follow-up period were calculated. Results: 80 patients from each population were matched after probabilistically linking 20,020 claims database patients with 83 registry database patients. The average cost of patients with at least one symptom was much higher than for patients without symptoms (13,570[9513,570 [95% CI: 13,459-13,680]vs.13,680] vs. 3,391 [95%CI: 3,3453,345-3,436], p-value<0.001). (1 US Dollar= 0.75 Euro) Cost differences were mainly from inpatient, physician visit, and pharmacy costs. Among patients with menopause symptoms, those with hot flashes had the highest costs (10,127),followedbymemoryloss(10,127), followed by memory loss (1,653), vaginal dryness (864),reducedlibido(864), reduced libido (568), and mood swings ($358). Conclusions: Women with menopause symptoms incur higher healthcare costs than those without This study suggests symptoms are important determinants of healthcare expenses and their impact can be assessed by linking registry and claims databases
    corecore