5 research outputs found

    Text Classification of Cancer Clinical Trial Eligibility Criteria

    Full text link
    Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.Comment: AMIA Annual Symposium Proceedings 202

    Readability Formulas and User Perceptions of Electronic Health Records Difficulty: A Corpus Study

    Get PDF
    BACKGROUND: Electronic health records (EHRs) are a rich resource for developing applications to engage patients and foster patient activation, thus holding a strong potential to enhance patient-centered care. Studies have shown that providing patients with access to their own EHR notes may improve the understanding of their own clinical conditions and treatments, leading to improved health care outcomes. However, the highly technical language in EHR notes impedes patients\u27 comprehension. Numerous studies have evaluated the difficulty of health-related text using readability formulas such as Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI). They conclude that the materials are often written at a grade level higher than common recommendations. OBJECTIVE: The objective of our study was to explore the relationship between the aforementioned readability formulas and the laypeople\u27s perceived difficulty on 2 genres of text: general health information and EHR notes. We also validated the formulas\u27 appropriateness and generalizability on predicting difficulty levels of highly complex technical documents. METHODS: We collected 140 Wikipedia articles on diabetes and 242 EHR notes with diabetes International Classification of Diseases, Ninth Revision code. We recruited 15 Amazon Mechanical Turk (AMT) users to rate difficulty levels of the documents. Correlations between laypeople\u27s perceived difficulty levels and readability formula scores were measured, and their difference was tested. We also compared word usage and the impact of medical concepts of the 2 genres of text. RESULTS: The distributions of both readability formulas\u27 scores (P \u3c .001) and laypeople\u27s perceptions (P=.002) on the 2 genres were different. Correlations of readability predictions and laypeople\u27s perceptions were weak. Furthermore, despite being graded at similar levels, documents of different genres were still perceived with different difficulty (P \u3c .001). Word usage in the 2 related genres still differed significantly (P \u3c .001). CONCLUSIONS: Our findings suggested that the readability formulas\u27 predictions did not align with perceived difficulty in either text genre. The widely used readability formulas were highly correlated with each other but did not show adequate correlation with readers\u27 perceived difficulty. Therefore, they were not appropriate to assess the readability of EHR notes

    Methods to Facilitate the Capture, Use, and Reuse of Structured and Unstructured Clinical Data.

    Full text link
    Electronic health records (EHRs) have great potential to improve quality of care and to support clinical and translational research. While EHRs are being increasingly implemented in U.S. hospitals and clinics, their anticipated benefits have been largely unachieved or underachieved. Among many factors, tedious documentation requirements and the lack of effective information retrieval tools to access and reuse data are two key reasons accounting for this deficiency. In this dissertation, I describe my research on developing novel methods to facilitate the capture, use, and reuse of both structured and unstructured clinical data. Specifically, I develop a framework to investigate potential issues in this research topic, with a focus on three significant challenges. The first challenge is structured data entry (SDE), which can be facilitated by four effective strategies based on my systematic review. I further propose a multi-strategy model to guide the development of future SDE applications. In the follow-up study, I focus on workflow integration and evaluate the feasibility of using EHR audit trail logs for clinical workflow analysis. The second challenge is the use of clinical narratives, which can be supported by my innovative information retrieval (IR) technique called “semantically-based query recommendation (SBQR)”. My user experiment shows that SBQR can help improve the perceived performance of a medical IR system, and may work better on search tasks with average difficulty. The third challenge involves reusing EHR data as a reference standard to benchmark the quality of other health-related information. My study assesses the readability of trial descriptions on ClinicalTrials.gov and found that trial descriptions are very hard to read, even harder than clinical notes. My dissertation has several contributions. First, it conducts pioneer studies with innovative methods to improve the capture, use, and reuse of clinical data. Second, my dissertation provides successful examples for investigators who would like to conduct interdisciplinary research in the field of health informatics. Third, the framework of my research can be a great tool to generate future research agenda in clinical documentation and EHRs. I will continue exploring innovative and effective methods to maximize the value of EHRs.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135845/1/tzuyu_1.pd
    corecore