4,927 research outputs found

    Text Classification of Cancer Clinical Trial Eligibility Criteria

    Full text link
    Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.Comment: AMIA Annual Symposium Proceedings 202

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Bigram feature extraction and conditional random fields model to improve text classification clinical trial document

    Get PDF
    In the field of health and medicine, there is a very important term known as clinical trials. Clinical trials are a type of activity that studies how the safest way to treat patients is. These clinical trials are usually written in unstructured free text which requires translation from a computer. The aim of this paper is to classify the texts of cancer clinical trial documents consisting of unstructured free texts taken from cancer clinical trial protocols. The proposed algorithm is conditional random Fields and bigram features. A new classification model from the cancer clinical trial document text is proposed to compete with other methods in terms of precision, recall, and f-1 score. The results of this study are better than the previous results, namely 88.07 precision, 88.05 recall and f-1 score 88.06

    Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks

    Get PDF
    Interventional cancer clinical trials are generally too restrictive, and some patients are often excluded on the basis of comorbidity, past or concomitant treatments, or the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are, therefore, not defined. In this work, we built a model to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. We used protocols from cancer clinical trials that were available in public registries from the last 18 years to train word-embeddings, and we constructed a~dataset of 6M short free-texts labeled as eligible or not eligible. A text classifier was trained using deep neural networks, with pre-trained word-embeddings as inputs, to predict whether or not short free-text statements describing clinical information were considered eligible. We additionally analyzed the semantic reasoning of the word-embedding representations obtained and were able to identify equivalent treatments for a type of tumor analogous with the drugs used to treat other tumors. We show that representation learning using {deep} neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols for potentially assisting practitioners when prescribing treatments

    Analysis of Eligibility Criteria Complexity in Clinical Trials

    Get PDF
    Formal, computer-interpretable representations of eligibility criteria would allow computers to better support key clinical research and care use cases such as eligibility determination. To inform the development of such formal representations for eligibility criteria, we conducted this study to characterize and quantify the complexity present in 1000 eligibility criteria randomly selected from studies in ClinicalTrials.gov. We classified the criteria by their complexity, semantic patterns, clinical content, and data sources. Our analyses revealed significant semantic and clinical content variability. We found that 93% of criteria were comprehensible, with 85% of these criteria having significant semantic complexity, including 40% relying on temporal data. We also identified several domains of clinical content. Using the findings of the study as requirements for computer-interpretable representations of eligibility, we discuss the challenges for creating such representations for use in clinical research and practice

    Patient recruitment, feasibility evaluations and use of electronic health records in clinical trials - A Nordic approach

    Get PDF
    Clinical trials constitute an important cornerstone for the development of new drugs. Patient recruitment is one of the main challenges in clinical trials. Pharmaceutical companies apply feasibility evaluations to identify potential countries, investigators and study sites for their trials and to evaluate their potential for successful patient recruitment. Electronic health records (EHR) maintained by health care providers are regarded as one potential tool for improving patient identification and recruitment for clinical trials. This study investigated patient recruitment and trial feasibility evaluations in the Nordic countries and the role and usability of EHR data in those processes. The pharmaceutical industry’s view was investigated by conducting semi-structured qualitative interviews of 21 respondents from Finland, Sweden, Norway and Denmark. Additionally, the usability of one commercial EHR research platform for identifying patients from Turku University Hospital’s EHR system was tested in comparison with a manual search. The success or failure of patient recruitment was influenced by many sponsorrelated, investigator/site-related, patient-related, collaboration-related and start-uprelated factors. Most trials had recruited their patients by reviewing the hospitals’ EHR data, but its use was much less frequent already during the feasibility evaluation phase. Feasibility evaluation was found to be a complex and time-consuming process for estimating the number of potential trial patients. The sponsors did not use HER tools for such evaluations, mainly because of legislative barriers. Although the HER data search tools have limitations in accuracy, they were seen to have great potential for identifying trial participants from the hospital EHR, for example by reducing the manual work. The comprehensive data in the EHR systems in the Nordic countries offer a possibility for more accurate identification of trial participants in the feasibility evaluations and may thus contribute to the success of recruitment. The data protection legislation and its interpretation should be harmonized for the use of EHR data. Continuous improvements in the EHR systems’ technical accuracy and data quality will be needed to enhance the successful use of EHR data in future clinical trials.Potilasrekrytointi, toteutettavuuden arviointi ja elektronisten potilastietojärjestelmien hyödyntäminen kliinisissä tutkimuksissa – Pohjoismainen näkökulma Kliiniset lääketutkimukset ovat uusien lääkkeiden kehityksen kulmakivi. Tutkimuspotilaiden rekrytointi on merkittävä haaste näissä tutkimuksissa. Lääkeyritykset tekevät toteutettavuusarviointeja tunnistaakseen potentiaalisia tutkimukseen osallistuvia maita, tutkijoita ja tutkimuskeskuksia ja arvioidakseen niiden mahdollisuuksia onnistua potilaiden rekrytoinnissa. Terveydenhuolto-organisaatioiden ylläpitämät elektroniset potilastietojärjestelmät (EHR) ovat tässä eräs mahdollinen työkalu. Tässä tutkimuksessa tutkittiin potilaiden rekrytointia ja tutkimusten toteutettavuusarviointeja Pohjoismaissa ja EHR:n roolia ja käytettävyyttä näissä prosesseissa. Näitä tekijöitä tekijöitä tutkittiin lääketeollisuuden näkökulmasta laadullisilla teemahaastatteluilla (21 haastateltavaa Suomesta, Ruotsista, Norjasta ja Tanskasta). Yhden kaupallisesti saatavilla olevan EHR-hakutyökalun tarkkuutta halutun potilasjoukon löytämisessä verrattiin perinteiseen, manuaaliseen hakuun Turun yliopistollisen sairaalan potilastietojärjestelmästä. Potilaiden rekrytoinnin onnistumiseen tai epäonnistumiseen vaikutti moni toimeksiantajaan, tutkijaan/tutkimuskeskukseen, potilaaseen ja tutkimuksen aloitustoimenpiteisiin liittyvä tekijä sekä näiden tahojen yhteistyö. Valtaosassa tutkimuksista tutkittavat rekrytoitiin keskuksen omista potilaista EHR:a hyödyntäen, mutta EHR:n käyttö potilasmäärän arvioinnissa ennen tutkimuksen alkua oli vähäistä. Toteutettavuusarvioinneissa tehdyt potilasmäärien arviot nähtiin monimutkaisina ja aikaa vievinä prosesseina. Toimeksiantajat eivät käyttäneet EHRtyökaluja lainkaan, pääasiassa tietosuojalainsäädäntöön liittyvistä syistä. Vaikka EHR-hakutyökalujen tarkkuudella on rajoitteensa, niitä voidaan hyödyntää esimerkiksi vähentämään manuaalista työtä potilaiden identifioinnissa. Terveydenhuollon kattavat EHR-järjestelmät tarjoavat Pohjoismaissa hyvän mahdollisuuden tutkimuspotilaiden tarkempaan identifiointiin, joka omalta osaltaan vaikuttaa rekrytoinnin onnistumismahdollisuuksiin. Tietosuojalainsäädäntöä ja sen tulkintoja on harmonisoitava EHR:n käytön hyödyntämiseksi. EHR-hakujen teknistä tarkkuutta ja tiedon laatua on edelleen parannettava sen menestyksekkään käytön lisäämiseksi tulevaisuuden kliinisissä tutkimuksissa

    Using linked administrative data for monitoring and evaluating the Family Nurse Partnership in England: A scoping report

    Get PDF
    This report, commissioned by the FNP National Unit and undertaken by researchers at UCL and the London School of Hygiene and Tropical Medicine, presents a scoping review of how population-based linkage between data from the Family Nurse Partnership (FNP) in England and administrative datasets from other services could be used to generate evidence for commissioning, service evaluation and research. It addresses the methodological considerations, permission pathways and technical challenges of using data from the FNP linked with routinely collected, administrative data from other public services for population-based analyses, at a national and local authority level. Our ambition, when commissioning this work, was to explore whether linking data from FNP with administrative datasets might help provide a richer view about how the FNP intervention is affecting different cohorts of clients and their child after they have graduated. The report suggests that the potential for data linkage to support ongoing evaluation of a wide range of interventions including FNP at a national level is promising and an important area to explore. It makes a significant contribution to understanding the possibilities and constraints for doing this, which include barriers to data linkage at a local level (which we know is crucial for local commissioners) and the significant investment required to realise the potential of this project. We believe this report offers valuable insights other organisations interested in the delivery of evidence based policy may want to pursue

    Insights from UKCTOCS for design, conduct and analyses of large randomised controlled trials

    Get PDF
    ABSTRACT: Randomised controlled trials are challenging to deliver. There is a constant need to review and refine recruitment and implementation strategies if they are to be completed on time and within budget. We present the strategies adopted in the United Kingdom Collaborative Trial of Ovarian Cancer Screening, one of the largest individually randomised controlled trials in the world. The trial recruited over 202,000 women (2001-5) and delivered over 670,000 annual screens (2001-11) and over 3 million women-years of follow-up (2001-20). Key to the successful completion were the involvement of senior investigators in the day-to-day running of the trial, proactive trial management and willingness to innovate and use technology. Our underlying ethos was that trial participants should always be at the centre of all our processes. We ensured that they were able to contact either the site or the coordinating centre teams for clarifications about their results, for follow-up and for rescheduling of appointments. To facilitate this, we shared personal identifiers (with consent) with both teams and had dedicated reception staff at both site and coordinating centre. Key aspects were a comprehensive online trial management system which included an electronic data capture system (resulting in an almost paperless trial), biobanking, monitoring and project management modules. The automation of algorithms (to ascertain eligibility and classify results and ensuing actions) and processes (scheduling of appointments, printing of letters, etc.) ensured the protocol was closely followed and timelines were met. Significant engagement with participants ensured retention and low rates of complaints. Our solutions to the design, conduct and analyses issues we faced are highly relevant, given the renewed focus on trials for early detection of cancer. FUTURE WORK: There is a pressing need to increase the evidence base to support decision making about all aspects of trial methodology. TRIAL REGISTRATION: ISRCTN-22488978; ClinicalTrials.gov-NCT00058032. FUNDING: This article presents independent research funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme as award number 16/46/01. The long-term follow-up UKCTOCS (2015 20) was supported by National Institute for Health and Care Research (NIHR HTA grant 16/46/01), Cancer Research UK, and The Eve Appeal. UKCTOCS (2001-14) was funded by the MRC (G9901012 and G0801228), Cancer Research UK (C1479/A2884), and the UK Department of Health, with additional support from The Eve Appeal. Researchers at UCL were supported by the NIHR UCL Hospitals Biomedical Research Centre and by the MRC Clinical Trials Unit at UCL core funding (MC_UU_00004/09, MC_UU_00004/08, MC_UU_00004/07). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the UK Department of Health and Social Care
    corecore