48 research outputs found
CLEF 2017 technologically assisted reviews in empirical medicine overview
Systematic reviews are a widely used method to provide an overview over the current scientific consensus, by bringing together multiple studies in a reliable, transparent way. The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying all relevant studies in an unbiased way both complex and time consuming to the extent that jeopardizes the validity of their findings and the ability to inform policy and practice in a timely manner. The CLEF 2017 e-Health Lab Task 2 focuses on the efficient and effective ranking of studies during the abstract and title screening phase of conducting Diagnostic Test Accuracy systematic reviews. We constructed a benchmark collection of fifty such reviews and the corresponding relevant and irrelevant articles found by the original Boolean query. Fourteen teams participated in the task, submitting 68 automatic and semi-automatic runs, using information retrieval and machine learning algorithms over a variety of text representations, in a batch and iterative manner. This paper reports both the methodology used to construct the benchmark collection, and the results of the evaluation
TrialMatch: A Transformer Architecture to Match Patients to Clinical Trials
Around 80% of clinical trials fail to meet the patient recruitment requirements, which
not only hinders the market growth but also delays patients’ access to new and effec-
tive treatments. A possible approach is to use Electronic Health Records (EHRs) to help
match patients to clinical trials. Past attempts at achieving this exact goal took place,
but due to a lack of data, they were unsuccessful. In 2021 Text REtrieval Conference
(TREC) introduced the Clinical Trials Track, where participants were challenged with
retrieving relevant clinical trials given the patient’s descriptions simulating admission
notes. Utilizing the track results as a baseline, we tackled the challenge, for this, we re-
sort to Information Retrieval (IR), implementing a pipeline for document ranking where
we explore the different retrieval methods, how to filter the clinical trials based on the
criteria, and reranking with Transformer based models. To tackle the problem, we ex-
plored models pre-trained on the biomedical domain, how to deal with long queries and
documents through query expansion and passage selection, and how to distinguish an
eligible clinical trial from an excluded clinical trial, using techniques such as Named
Entity Recognition (NER) and Clinical Assertion. Our results let to the finding that the
current state-of-the-art Bidirectional Encoder Representations from Transformers (BERT)
bi-encoders outperform the cross-encoders in the mentioned task, whilst proving that
sparse retrieval methods are capable of obtaining competitive outcomes, and to finalize
we showed that the use of the demographic information available can be used to improve
the final result.Cerca de 80% dos ensaios clÃnicos não satisfazem os requisitos de recrutamento de paci-
entes, o que não só dificulta o crescimento do mercado como também impede o acesso
dos pacientes a novos e eficazes tratamentos. Uma abordagem possÃvel é utilizar os Pron-
tuários Eletrônicos para ajudar a combinar doentes a ensaios clÃnicos. Tentativas passadas
para alcançar este exato objetivo tiveram lugar, mas devido à falta de dados, não foram
bem sucedidos. Em 2021, a TREC introduziu a Clinical Trials Track, onde os participantes
foram desafiados com a recuperação ensaios clÃnicos relevantes, dadas as descrições dos
pacientes simulando notas de admissão. Utilizando os resultados da track como base, en-
frentámos o desafio, para isso recorremos à Recuperação de Informação, implementando
uma pipeline para a classificação de documentos onde exploramos os diferentes métodos
de recuperação, como filtrar os ensaios clÃnicos com base nos critérios, e reclassificação
com modelos baseados no Transformer. Para enfrentar o problema, explorámos modelos
pré-treinados no domÃnio biomédico, como lidar com longas descrições e documentos,
e como distinguir um ensaio clÃnico elegÃvel de um ensaio clÃnico excluÃdo, utilizando
técnicas como Reconhecimento de Entidade Mencionada e Asserção ClÃnica. Os nossos re-
sultados permitem concluir que os actuais bi-encoders de última geração BERT superam
os cross-encoders BERT na tarefa mencionada, provamos que os métodos de recuperação
esparsos são capazes de obter resultados competitivos, e para finalizar mostramos que
a utilização da informação demográfica disponÃvel pode ser utilizada para melhorar o
resultado fina
Overview of the CLEF eHealth Evaluation Lab 2018
In this paper, we provide an overview of the sixth annual edition of the CLEF eHealth evaluation lab. CLEF eHealth 2018 continues
our evaluation resource building efforts around the easing and support of
patients, their next-of-kins, clinical staff, and health scientists in understanding, accessing, and authoring eHealth information in a multilingual
setting. This year’s lab offered three tasks: Task 1 on multilingual information extraction to extend from last year’s task on French and English
corpora to French, Hungarian, and Italian; Task 2 on technologically
assisted reviews in empirical medicine building on last year’s pilot task in English; and Task 3 on Consumer Health Search (CHS) in mono- and
multilingual settings that builds on the 2013–17 Information Retrieval
tasks. In total 28 teams took part in these tasks (14 in Task 1, 7 in Task
2 and 7 in Task 3). Herein, we describe the resources created for these
tasks, outline our evaluation methodology adopted and provide a brief
summary of participants of this year’s challenges and results obtained.
As in previous years, the organizers have made data and tools associated
with the lab tasks available for future research and development
Matching Patients to Clinical Trials with Large Language Models
Clinical trials are vital in advancing drug development and evidence-based
medicine, but their success is often hindered by challenges in patient
recruitment. In this work, we investigate the potential of large language
models (LLMs) to assist individual patients and referral physicians in
identifying suitable clinical trials from an extensive selection. Specifically,
we introduce TrialGPT, a novel architecture employing LLMs to predict
criterion-level eligibility with detailed explanations, which are then
aggregated for ranking and excluding candidate clinical trials based on
free-text patient notes. We evaluate TrialGPT on three publicly available
cohorts of 184 patients and 18,238 annotated clinical trials. The experimental
results demonstrate several key findings: First, TrialGPT achieves high
criterion-level prediction accuracy with faithful explanations. Second, the
aggregated trial-level TrialGPT scores are highly correlated with expert
eligibility annotations. Third, these scores prove effective in ranking
clinical trials and exclude ineligible candidates. Our error analysis suggests
that current LLMs still make some mistakes due to limited medical knowledge and
domain-specific context understanding. Nonetheless, we believe the explanatory
capabilities of LLMs are highly valuable. Future research is warranted on how
such AI assistants can be integrated into the routine trial matching workflow
in real-world settings to improve its efficiency
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science.</p
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science