37,946 research outputs found

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    Optimising use of electronic health records to describe the presentation of rheumatoid arthritis in primary care: a strategy for developing code lists

    Get PDF
    Background Research using electronic health records (EHRs) relies heavily on coded clinical data. Due to variation in coding practices, it can be difficult to aggregate the codes for a condition in order to define cases. This paper describes a methodology to develop ‘indicator markers’ found in patients with early rheumatoid arthritis (RA); these are a broader range of codes which may allow a probabilistic case definition to use in cases where no diagnostic code is yet recorded. Methods We examined EHRs of 5,843 patients in the General Practice Research Database, aged ≥30y, with a first coded diagnosis of RA between 2005 and 2008. Lists of indicator markers for RA were developed initially by panels of clinicians drawing up code-lists and then modified based on scrutiny of available data. The prevalence of indicator markers, and their temporal relationship to RA codes, was examined in patients from 3y before to 14d after recorded RA diagnosis. Findings Indicator markers were common throughout EHRs of RA patients, with 83.5% having 2 or more markers. 34% of patients received a disease-specific prescription before RA was coded; 42% had a referral to rheumatology, and 63% had a test for rheumatoid factor. 65% had at least one joint symptom or sign recorded and in 44% this was at least 6-months before recorded RA diagnosis. Conclusion Indicator markers of RA may be valuable for case definition in cases which do not yet have a diagnostic code. The clinical diagnosis of RA is likely to occur some months before it is coded, shown by markers frequently occurring ≥6 months before recorded diagnosis. It is difficult to differentiate delay in diagnosis from delay in recording. Information concealed in free text may be required for the accurate identification of patients and to assess the quality of care in general practice
    • …
    corecore