18,414 research outputs found

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Influencing Factors of Clinical Patient Recruitment Systems Design

    Get PDF
    Clinical patient recruitment (CPR) is a critical function in clinical research. However, there is no holistic design for CPR systems that incorporates functions to support all critical success factors of clinical trial performance. In order to fill this gap, a study based on a literature review and several semi-structured expert interviews was conducted. Existing theory was synthesized with newly found influence factors using categories from CPR theory and factors gathered from literature and experts. The result is a systematization of influence factors of CPR that can be used for derivation of requirements for CPR systems in a subsequent research step or for the purpose of causal modeling

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    Natural language processing for mimicking clinical trial recruitment in critical care: a semi-automated simulation based on the LeoPARDS trial

    Get PDF
    Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records (EHR) may help, but much of the information is in free text rather than a computable form. We applied natural language processing (NLP) to free text EHR data using the CogStack platform to simulate recruitment into the LeoPARDS study, a clinical trial aiming to reduce organ dysfunction in septic shock. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared patients identified by our approach with those actually screened and recruited for the trial, for the time period that data were available. We manually reviewed records of a random sample of patients identified by the algorithm but not screened in the original trial. Our method identified 376 patients, including 34 patients with EHR data available who were actually recruited to LeoPARDS in our centre. The sensitivity of CogStack for identifying patients screened was 90% (95% CI 85%, 93%). Of the 203 patients identified by both manual screening and CogStack, the index date matched in 95 (47%) and CogStack was earlier in 94 (47%). In conclusion, analysis of EHR data using NLP could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage, potentially improving trial recruitment if implemented in real time

    Comorbidity and Quality of Life in Adults with Hair Pulling Disorder

    Get PDF
    Hair pulling disorder (HPD; trichotillomania) is thought to be associated with significant psychiatric comorbidity and functional impairment. However, few methodologically rigorous studies of HPD have been conducted, rendering such conclusions tenuous. The following study examined comorbidity and psychosocial functioning in a well-characterized sample of adults with HPD (N=85) who met DSM-IV criteria, had at least moderate hair pulling severity, and participated in a clinical trial. Results revealed that 38.8% of individuals with HPD had another current psychiatric diagnosis and 78.8% had another lifetime (present and/or past) psychiatric diagnosis. Specifically, HPD showed substantial overlap with depressive, anxiety, addictive, and other body-focused repetitive behavior disorders. The relationships between certain comorbidity patterns, hair pulling severity, current mood and anxiety symptoms, and quality of life were also examined. Results showed that current depressive symptoms were the only predictor of quality of life deficits. Implications of these findings for the conceptualization and treatment of HPD are discussed

    Comparison of Electronic Record Types Concerning the Applicability for Patient Recruitment (Preprint)

    Get PDF
    Background Clinical trials constitute an important pillar in medical research. It is beneficial to support recruitment for clinical trials using software tools, so-called patient recruitment support systems; however, such information technology systems have not been frequently used to date. Because medical information systems' underlying data collection methods strongly influence the benefits of implementing patient recruitment support systems, we investigated patient recruitment support system requirements and corresponding electronic record types such as electronic medical record, electronic health record, electronic medical case record, personal health record, and personal cross-enterprise health record. Objective The aim of this study was to (1) define requirements for successful patient recruitment support system deployment and (2) differentiate and compare patient recruitment support system-relevant properties of different electronic record types. Results Patient recruitment support system requirements (n=16) were grouped into 4 categories (consent management, patient recruitment management, trial management, and general requirements). All 16 requirements could be partially met by at least 1 type of electronic record. Only 1 requirement was fully met by all 5 types. According to our analysis, personal cross-enterprise health records fulfill most requirements for patient recruitment support systems. They demonstrate advantages especially in 2 domains (1) supporting patient empowerment and (2) granting access to the complete medical history of patients. Conclusions In combination with patient recruitment support systems, personal cross-enterprise health records prove superior to other electronic record types, and therefore, this integration approach should be further investigated

    Implicit and Explicit Self-Esteem in Current, Remitted, Recovered, and Comorbid Depression and Anxiety Disorders: The NESDA Study

    Get PDF
    BACKGROUND: Dual processing models of psychopathology emphasize the relevance of differentiating between deliberative self-evaluative processes (explicit self-esteem; ESE) and automatically-elicited affective self-associations (implicit self-esteem; ISE). It has been proposed that both low ESE and ISE would be involved in major depressive disorder (MDD) and anxiety disorders (AD). Further, it has been hypothesized that MDD and AD may result in a low ISE "scar" that may contribute to recurrence after remission. However, the available evidence provides no straightforward support for the relevance of low ISE in MDD/AD, and studies testing the relevance of discrepant SE even showed that especially high ISE combined with low ESE is predictive of the development of internalizing symptoms. However, these earlier findings have been limited by small sample sizes, poorly defined groups in terms of comorbidity and phase of the disorders, and by using inadequate indices of discrepant SE. Therefore, this study tested further the proposed role of ISE and discrepant SE in a large-scale study allowing for stricter differentiation between groups and phase of disorder. METHOD: In the context of the Netherlands Study of Depression and Anxiety (NESDA), we selected participants with current MDD (n = 60), AD (n = 111), and comorbid MDD/AD (n = 71), remitted MDD (n = 41), AD (n = 29), and comorbid MDD/AD (n = 14), recovered MDD (n = 136) and AD (n = 98), and never MDD or AD controls (n = 382). The Implicit Association Test was used to index ISE and the Rosenberg Self-Esteem Scale indexed ESE. RESULTS: Controls reported higher ESE than all other groups, and current comorbid MDD/AD had lower ESE than all other clinical groups. ISE was only lower than controls in current comorbid AD/MDD. Discrepant self-esteem (difference between ISE and ESE) was not associated with disorder status once controlling for ESE. LIMITATIONS: Cross-sectional design limits causal inferences. CONCLUSION: Findings suggest a prominent role for ESE in MDD and AD, while in comorbid MDD/AD negative self-evaluations are also present at the implicit level. There was no evidence to support the view that AD and MDD would result in a low ISE "scar"

    Text-mining in electronic healthcare records can be used as efficient tool for screening and data collection in cardiovascular trials: a multicenter validation study

    Get PDF
    Objective: This study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial. Study Design and Setting: In three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel. Results: Of the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants). In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4e8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7e92.8%). Conclusion: Automatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline informatio
    corecore