Search CORE

6,741 research outputs found

Data extraction methods for systematic review (semi)automation: A living systematic review [version 1; peer review: awaiting peer review]

Author: Higgins JPT
McGuinness LA
Olorisade BK
Schmidt L
Thomas J
Publication venue: 'F1000 Research Ltd'
Publication date: 19/05/2021
Field of study

Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search MEDLINE, Institute of Electrical and Electronics Engineers (IEEE), arXiv, and the dblp computer science bibliography databases. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This iteration of the living review includes publications up to a cut-off date of 22 April 2020. Results: In total, 53 publications are included in this version of our review. Of these, 41 (77%) of the publications addressed extraction of data from abstracts, while 14 (26%) used full texts. A total of 48 (90%) publications developed and evaluated classifiers that used randomised controlled trials as the main target texts. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. A description of their datasets was provided by 49 publications (94%), but only seven (13%) made the data publicly available. Code was made available by 10 (19%) publications, and five (9%) implemented publicly available tools. Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of systematic review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. The lack of publicly available gold-standard data for evaluation, and lack of application thereof, makes it difficult to draw conclusions on which is the best-performing system for each data extraction target. With this living review we aim to review the literature continually

Recommended from our members

Novel methods to estimate antiretroviral adherence: protocol for a longitudinal study.

Author: Gandhi Monica
Johnson Mallory O
Legnitto Dominique
Ming Kristin
Neilands Torsten B
Saberi Parya
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

BackgroundThere is currently no gold standard for assessing antiretroviral (ARV) adherence, so researchers often resort to the most feasible and cost-effective methods possible (eg, self-report), which may be biased or inaccurate. The goal of our study was to evaluate the feasibility and acceptability of innovative and remote methods to estimate ARV adherence, which can potentially be conducted with less time and financial resources in a wide range of clinic and research settings. Here, we describe the research protocol for studying these novel methods and some lessons learned.MethodsThe 6-month pilot study aimed to examine the feasibility and acceptability of a remotely conducted study to evaluate the correlation between: 1) text-messaged photographs of pharmacy refill dates for refill-based adherence; 2) text-messaged photographs of pills for pill count-based adherence; and 3) home-collected hair sample measures of ARV concentration for pharmacologic-based adherence. Participants were sent monthly automated text messages to collect refill dates and pill counts that were taken and sent via mobile telephone photographs, and hair collection kits every 2 months by mail. At the study end, feasibility was calculated by specific metrics, such as the receipt of hair samples and responses to text messages. Participants completed a quantitative survey and qualitative exit interviews to examine the acceptability of these adherence evaluation methods. The relationship between the 3 novel metrics of adherence and self-reported adherence will be assessed.DiscussionInvestigators conducting adherence research are often limited to using either self-reported adherence, which is subjective, biased, and often overestimated, or other more complex methods. Here, we describe the protocol for evaluating the feasibility and acceptability of 3 novel and remote methods of estimating adherence, with the aim of evaluating the relationships between them. Additionally, we note the lessons learned from the protocol implementation to date. We expect that these novel measures will be feasible and acceptable. The implications of this research will be the identification and evaluation of innovative and accurate metrics of ARV adherence for future implementation

eScholarship - University of California

Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

Author: Jung Euisung
Publication venue: UWM Digital Commons
Publication date: 01/08/2015
Field of study

Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

Table-to-Text: Generating Descriptive Text for Scientific Tables from Randomized Controlled Trials

Author: Wei Qiang
Publication venue: DigitalCommons@TMC
Publication date: 01/05/2020
Field of study

Unprecedented amounts of data have been generated in the biomedical domain, and the bottleneck for biomedical research has shifted from data generation to data management, interpretation, and communication. Therefore, it is highly desirable to develop systems to assist in text generation from biomedical data, which will greatly improve the dissemination of scientific findings. However, very few studies have investigated issues of data-to-text generation in the biomedical domain. Here I present a systematic study for generating descriptive text from tables in randomized clinical trials (RCT) articles, which includes: (1) an information model for representing RCT tables; (2) annotated corpora containing pairs of RCT table and descriptive text, and labeled structural and semantic information of RCT tables; (3) methods for recognizing structural and semantic information of RCT tables; (4) methods for generating text from RCT tables, evaluated by a user study on three aspects: relevance, grammatical quality, and matching. The proposed hybrid text generation method achieved a low bilingual evaluation understudy (BLEU) score of 5.69; but human review achieved scores of 9.3, 9.9 and 9.3 for relevance, grammatical quality and matching, respectively, which are comparable to review of original human-written text. To the best of our knowledge, this is the first study to generate text from scientific tables in the biomedical domain. The proposed information model, labeled corpora and developed methods for recognizing tables and generating descriptive text could also facilitate other biomedical and informatics research and applications

DigitalCommons@The Texas Medical Center