9,355 research outputs found

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Repeatability of quantitative18F-FLT uptake measurements in solid tumors: an individual patient data multi-center meta-analysis

    Get PDF
    INTRODUCTION: 3'-deoxy-3'-[18F]fluorothymidine (18F-FLT) positron emission tomography (PET) provides a non-invasive method to assess cellular proliferation and response to antitumor therapy. Quantitative18F-FLT uptake metrics are being used for evaluation of proliferative response in investigational setting, however multi-center repeatability needs to be established. The aim of this study was to determine the repeatability of18F-FLT tumor uptake metrics by re-analyzing individual patient data from previously published reports using the same tumor segmentation method and repeatability metrics across cohorts. METHODS: A systematic search in PubMed, EMBASE.com and the Cochrane Library from inception-October 2016 yielded five18F-FLT repeatability cohorts in solid tumors.18F-FLT avid lesions were delineated using a 50% isocontour adapted for local background on test and retest scans. SUVmax, SUVmean, SUVpeak, proliferative volume and total lesion uptake (TLU) were calculated. Repeatability was assessed using the repeatability coefficient (RC = 1.96 × SD of test-retest differences), linear regression analysis, and the intra-class correlation coefficient (ICC). The impact of different lesion selection criteria was also evaluated. RESULTS: Images from four cohorts containing 30 patients with 52 lesions were obtained and analyzed (ten in breast cancer, nine in head and neck squamous cell carcinoma, and 33 in non-small cell lung cancer patients). A good correlation was found between test-retest data for all18F-FLT uptake metrics (R2 ≥ 0.93; ICC ≥ 0.96). Best repeatability was found for SUVpeak(RC: 23.1%), without significant differences in RC between different SUV metrics. Repeatability of proliferative volume (RC: 36.0%) and TLU (RC: 36.4%) was worse than SUV. Lesion selection methods based on SUVmax ≥ 4.0 improved the repeatability of volumetric metrics (RC: 26-28%), but did not affect the repeatability of SUV metrics. CONCLUSIONS: In multi-center studies, differences ≥ 25% in18F-FLT SUV metrics likely represent a true change in tumor uptake. Larger differences are required for FLT metrics comprising volume estimates when no lesion selection criteria are applied

    Rule-based Formalization of Eligibility Criteria for Clinical Trials

    Get PDF
    Abstract. In this paper, we propose a rule-based formalization of eli-gibility criteria for clinical trials. The rule-based formalization is imple-mented by using the logic programming language Prolog. Compared with existing formalizations such as pattern-based and script-based languages, the rule-based formalization has the advantages of being declarative, ex-pressive, reusable and easy to maintain. Our rule-based formalization is based on a general framework for eligibility criteria containing three types of knowledge: (1) trial-specific knowledge, (2) domain-specific knowledge and (3) common knowledge. This framework enables the reuse of several parts of the formalization of eligibility criteria. We have implemented the proposed rule-based formalization in SemanticCT, a semantically-enabled system for clinical trials, showing the feasibility of using our rule-based formalization of eligibility criteria for supporting patient re-cruitment in clinical trial systems.

    The Bionic Radiologist: avoiding blurry pictures and providing greater insights

    Get PDF
    Radiology images and reports have long been digitalized. However, the potential of the more than 3.6 billion radiology examinations performed annually worldwide has largely gone unused in the effort to digitally transform health care. The Bionic Radiologist is a concept that combines humanity and digitalization for better health care integration of radiology. At a practical level, this concept will achieve critical goals: (1) testing decisions being made scientifically on the basis of disease probabilities and patient preferences; (2) image analysis done consistently at any time and at any site; and (3) treatment suggestions that are closely linked to imaging results and are seamlessly integrated with other information. The Bionic Radiologist will thus help avoiding missed care opportunities, will provide continuous learning in the work process, and will also allow more time for radiologists’ primary roles: interacting with patients and referring physicians. To achieve that potential, one has to cope with many implementation barriers at both the individual and institutional levels. These include: reluctance to delegate decision making, a possible decrease in image interpretation knowledge and the perception that patient safety and trust are at stake. To facilitate implementation of the Bionic Radiologist the following will be helpful: uncertainty quantifications for suggestions, shared decision making, changes in organizational culture and leadership style, maintained expertise through continuous learning systems for training, and role development of the involved experts. With the support of the Bionic Radiologist, disparities are reduced and the delivery of care is provided in a humane and personalized fashion

    Bayesian Methods and Machine Learning for Processing Text and Image Data

    Get PDF
    Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to deal with the local maximum problems which the ML(maximum likelihood) method gives inaccurate estimates. The EM(expectation-maximization) algorithm can be applied with MAP estimation for the incomplete/missing data problems. Our proposed framework can be used in both supervised and unsupervised classification. For natural language processing(NLP), we applied the machine learning techniques for sentence/text classification. For 3D CT image segmentation, MAP EM clustering approach is proposed to auto-detect the number of objects in the 3D CT luggage image, and the prior knowledge and constraints in MAP estimation are used to avoid/improve the local maximum problems. The algorithm can automatically determine the number of classes and find the optimal parameters for each class. As a result, it can automatically detect the number of objects and produce better segmentation for each object in the image. For segmented object recognition, we applied machine learning techniques to classify each object into targets or non-targets. We have achieved the good results with 90% PD(probability of detection) and 6% PFA(probability of false alarm). For image restoration, in X-ray imaging, scatter can produce noise, artifacts, and decreased contrast. In practice, hardware such as anti-scatter grid is often used to reduce scatter. However, the remaining scatter can still be significant and additional software-based correction is desirable. Furthermore, good software solutions can potentially reduce the amount of needed anti-scatter hardware, thereby reducing cost. In this work, the scatter correction is formulated as a Bayesian MAP (maximum a posteriori) problem with a non-local prior, which leads to better textural detail preservation in scatter reduction. The efficacy of our algorithm is demonstrated through experimental and simulation results

    Natural Language Processing – Finding the Missing Link for Oncologic Data, 2022

    Get PDF
    Oncology like most medical specialties, is undergoing a data revolution at the center of which lie vast and growing amounts of clinical data in unstructured, semi-structured and structed formats. Artificial intelligence approaches are widely employed in research endeavors in an attempt to harness electronic medical records data to advance patient outcomes. The use of clinical oncologic data, although collected on large scale, particularly with the increased implementation of electronic medical records, remains limited due to missing, incorrect or manually entered data in registries and the lack of resource allocation to data curation in real world settings. Natural Language Processing (NLP) may provide an avenue to extract data from electronic medical records and as a result has grown considerably in medicine to be employed for documentation, outcome analysis, phenotyping and clinical trial eligibility. Barriers to NLP persist with inability to aggregate findings across studies due to use of different methods and significant heterogeneity at all levels with important parameters such as patient comorbidities and performance status lacking implementation in AI approaches. The goal of this review is to provide an updated overview of natural language processing (NLP) and the current state of its application in oncology for clinicians and researchers that wish to implement NLP to augment registries and/or advance research projects

    Are decision trees a feasible knowledge representation to guide extraction of critical information from randomized controlled trial reports?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper proposes the use of decision trees as the basis for automatically extracting information from published randomized controlled trial (RCT) reports. An exploratory analysis of RCT abstracts is undertaken to investigate the feasibility of using decision trees as a semantic structure. Quality-of-paper measures are also examined.</p> <p>Methods</p> <p>A subset of 455 abstracts (randomly selected from a set of 7620 retrieved from Medline from 1998 – 2006) are examined for the quality of RCT reporting, the identifiability of RCTs from abstracts, and the completeness and complexity of RCT abstracts with respect to key decision tree elements. Abstracts were manually assigned to 6 sub-groups distinguishing whether they were primary RCTs versus other design types. For primary RCT studies, we analyzed and annotated the reporting of intervention comparison, population assignment and outcome values. To measure completeness, the frequencies by which complete intervention, population and outcome information are reported in abstracts were measured. A qualitative examination of the reporting language was conducted.</p> <p>Results</p> <p>Decision tree elements are manually identifiable in the majority of primary RCT abstracts. 73.8% of a random subset was primary studies with a single population assigned to two or more interventions. 68% of these primary RCT abstracts were structured. 63% contained pharmaceutical interventions. 84% reported the total number of study subjects. In a subset of 21 abstracts examined, 71% reported numerical outcome values.</p> <p>Conclusion</p> <p>The manual identifiability of decision tree elements in the abstract suggests that decision trees could be a suitable construct to guide machine summarisation of RCTs. The presence of decision tree elements could also act as an indicator for RCT report quality in terms of completeness and uniformity.</p

    ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

    Get PDF
    Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols

    Help: defining the usability requirements of a breast cancer long-term survivorship (LTS) navigator

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Long-term survivors (LTSs) of breast cancer are defined as patients who have been in remission for a year or longer. Even after being declared breast-cancer-free, many LTSs have questions that were not answered by clinicians. Although online resources provide some content for LTSs, none, or very little, provide immediate answers to specific questions. Thus, the aim involves proposing specifications for a system, the Health Electronic Learning Platform (HELP), that can assist survivors by becoming an all-inclusive resource for LTSs of breast cancer. To achieve this, relevant information from the literature was used to assess the needs of LTSs. Also, data from a study involving the breast cancer survivor’s forum project that had been filtered to include posts with mentions of features to be added to the website and usability issues encountered. To complete the actual design of the system, a synthesis of the results obtained from these two sources was performed. HELP is simple in terms of its layout and consists of a main search-bar, where LTSs are able to ask questions using their own terms and language. This navigator should not be taken as definitive solution, but instead, should be used as a starting point toward better patient-centered care

    Text Classification of Cancer Clinical Trial Eligibility Criteria

    Full text link
    Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.Comment: AMIA Annual Symposium Proceedings 202
    corecore