2,754 research outputs found

    Information systems in clinical research : categorization and evaluation of information systems and development of a guide for choosing the appropriate information system

    Get PDF
    Διπλωματική εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2019.The development of information systems used in clinical research is constantly increasing, as their advantages are widely acknowledged. Although many researchers have introduced information systems which can be used during a clinical study’s process, a scarcity of information systems accommodating the complete process has been detected. Based on this finding, twenty-three (23) information systems and ontologies used in clinical research were retrieved, based on certain criteria. The information systems and ontologies were then categorized and evaluated based on categorization and evaluation tools. Finally, the result was the synthesis of the eligible-for-evaluation information systems and the development of a guide for choosing the appropriate information system during each step of a clinical trial; the data provided by each information system were identified. Unfortunately, some information systems and ontologies were excluded from the synthesis due to lack of information regarding the evaluation criteria. Therefore, future research should proceed with retrieving this information and developing a guide which will consider more information systems, especially for conducting observational studies

    Supporting UK-wide e-clinical trials and studies

    Get PDF
    As clinical trials and epidemiological studies become increasingly large, covering wider (national) geographical areas and involving ever broader populations, the need to provide an information management infrastructure that can support such endeavours is essential. A wealth of clinical data now exists at varying levels of care (primary care, secondary care, etc.). Simple, secure access to such data would greatly benefit the key processes involved in clinical trials and epidemiological studies: patient recruitment, data collection and study management. The Grid paradigm provides one model for seamless access to such data and support of these processes. The VOTES project (Virtual Organisations for Trials and Epidemiological Studies) is a collaboration between several UK institutions to implement a generic framework that effectively leverages the available health-care information across the UK to support more efficient gathering and processing of trial information. The structure of the information available in the health-care domain in the UK itself varies broadly in-line with the national boundaries of the constituent states (England, Scotland, Wales and Northern Ireland). Technologies must address these political boundaries and the impact these boundaries have in terms of for example, information governance, policies, and of course large-scale heterogeneous distribution of the data sets themselves. This paper outlines the methodology in implementing the framework between three specific data sources that serve as useful case studies: Scottish data from the Scottish Care Information (SCI) Store data repository, data on the General Practice Research Database (GPRD) diabetes trial at Imperial College London, and benign prostate hypoplasia (BPH) data from the University of Nottingham. The design, implementation and wider research issues are discussed along with the technological challenges encountered in the project in the application of Grid technologies

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Natural language processing for mimicking clinical trial recruitment in critical care: a semi-automated simulation based on the LeoPARDS trial

    Get PDF
    Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records (EHR) may help, but much of the information is in free text rather than a computable form. We applied natural language processing (NLP) to free text EHR data using the CogStack platform to simulate recruitment into the LeoPARDS study, a clinical trial aiming to reduce organ dysfunction in septic shock. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared patients identified by our approach with those actually screened and recruited for the trial, for the time period that data were available. We manually reviewed records of a random sample of patients identified by the algorithm but not screened in the original trial. Our method identified 376 patients, including 34 patients with EHR data available who were actually recruited to LeoPARDS in our centre. The sensitivity of CogStack for identifying patients screened was 90% (95% CI 85%, 93%). Of the 203 patients identified by both manual screening and CogStack, the index date matched in 95 (47%) and CogStack was earlier in 94 (47%). In conclusion, analysis of EHR data using NLP could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage, potentially improving trial recruitment if implemented in real time

    Cohort selection for clinical trials from longitudinal patient records: text mining approach

    Get PDF
    Background: Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. Objective: The Track 1 of the 2018 National NLP Clinical Challenge (n2c2) focused on the task of cohort selection for clinical trials with the aim of answering the following question: 'Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials?' The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. This article describes a system developed to address this task. Methods: Our system consists of 13 classifiers, one for each eligibility criterion. All classifiers use a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern matching approach is used to extract context-sensitive features. They are embedded back into the text as lexically distinguishable tokens, which will consequently be featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances were available to learn from. A rule–based approach focusing on a small set of relevant features was chosen for the remaining criteria. Results: The system was evaluated using micro-averaged F–measure. Four machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier and gradient tree boosting, were evaluated on the training data using 10–fold cross-validation. Overall, gradient tree boosting demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. Final evaluation was performed on previously unseen test data. On average, the F-measure of 89.04% was comparable to three of the top ranked performances in the shared task (91.11%, 90.28% and 90.21%). With F-measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50% and 70.81%) in identifying patients with advanced coronary artery disease. Conclusions: The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset

    Automatically selecting patients for clinical trials with justifications

    Get PDF
    Clinical trials are human research studies that are used to evaluate the effectiveness of a surgical, medical, or behavioral intervention. They have been widely used by researchers to determine whether a new treatment, such as a new medication, is safe and effective in humans. A clinical trial is frequently performed to determine whether a new treatment is more successful than the current treatment or has less harmful side effects. However, clinical trials have a high failure rate. One method applied is to find patients based on patient records. Unfortunately, this is a difficult process. This is because this process is typically performed manually, making it time-consuming and error-prone. Consequently, clinical trial deadlines are often missed, and studies do not move forward. Time can be a determining factor for success. Therefore, it would be advantageous to have automatic support in this process. Since it is also important to be able to validate whether the patients were selected correctly for the trial, avoiding eventual health problems, it would be important to have a mechanism to present justifications for the selected patients. In this dissertation, we present one possible solution to solve the problem of patient selection for clinical trials. We developed the necessary algorithms and created a simple and intuitive web application that features the selection of patients for clinical trials automatically. This was achieved by combining knowledge expressed in different formalisms. We integrated medical knowledge using ontologies, with criteria that were expressed using nonmonotonic rules. To address the validation procedure automatically, we developed a mechanism that generates the justifications for each selection together with the results of the patients who were selected. In the end, it is expected that a user can easily enter a set of trial criteria, and the application will generate the results of the selected patients and their respective justifications, based on the criteria inserted, medical information and a database of patient information.Os ensaios clínicos são estudos de pesquisa em humanos, utilizados para avaliar a eficácia de uma intervenção cirúrgica, médica ou comportamental. Estes estudos, têm sido amplamente utilizados pelos investigadores para determinar se um novo tratamento, como é o caso de um novo medicamento, é seguro e eficaz em humanos. Um ensaio clínico é realizado frequentemente, para determinar se um novo tratamento tem mais sucesso do que o tratamento atual ou se tem menos efeitos colaterais prejudiciais. No entanto, os ensaios clínicos têm uma taxa de insucesso alta. Um método aplicado é encontrar pacientes com base em registos. Infelizmente, este é um processo difícil. Isto deve-se ao facto deste processo ser normalmente realizado à mão, o que o torna demorado e propenso a erros. Consequentemente, o prazo dos ensaios clínicos é muitas vezes ultrapassado e os estudos acabam por não avançar. O tempo pode ser por vezes um fator determinante para o sucesso. Seria então vantajoso ter algum apoio automático neste processo. Visto que também seria importante validar se os pacientes foram selecionados corretamente para o ensaio, evitando até eventuais problemas de saúde, seria importante ter um mecanismo que apresente justificações para os pacientes selecionados. Nesta dissertação, apresentamos uma possível solução para resolver o problema da seleção de pacientes para ensaios clínicos, através da criação de uma aplicação web, intuitiva e fácil de utilizar, que apresenta a seleção de pacientes para ensaios clínicos de forma automática. Isto foi alcançado através da combinação de conhecimento expresso em diferentes formalismos. Integrámos o conhecimento médico usando ontologias, com os critérios que serão expressos usando regras não monotónicas. Para tratar do processo de validação, desenvolvemos um mecanismo que gera justificações para cada seleção juntamente com os resultados dos pacientes selecionados. No final, é esperado que o utilizador consiga inserir facilmente um conjunto de critérios de seleção, e a aplicação irá gerar os resultados dos pacientes selecionados e as respetivas justificações, com base nos critérios inseridos, informações médicas e uma base de dados com informações dos pacientes

    LeafAI: query generator for clinical cohort discovery rivaling a human programmer

    Full text link
    Objective: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. Materials and Methods: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible across 8 clinical trials, compared to 27% matched and 14,587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. Conclusions: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival a human programmer in finding patients eligible for clinical trials

    Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII

    Get PDF
    The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system’s ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users.National Institutes of Health Office of Research Infrastructure Programs (R01OD010929 to M.T. and K.D.); Canadian Institutes of Health Research (FDN-167277 to M.T.); Canada Research Chair in Systems and Synthetic Biology (to M.T.); National Institutes of Health (2U24HG007822-08, 1R35 GM141873-01 to K.E.R. and C.N.A); Spanish Plan for the Advancement of Language Technology and Proyectos I+D+i2020-AI4PROFHEALTH (PID2020-119266RA-I00 to M.K.); MITRE (W56KGU-18-D-0004 to L.H. and T.K.). The views, opinions and/or findings contained in this report are those of the authors and should not be construed as an official government position, policy or decision.Peer ReviewedPostprint (published version
    corecore