406 research outputs found
What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task
on precision medicine using documents from medical publications (PubMed) and
clinical trials. Despite lots of performance measurements carried out in these
evaluation campaigns, the scientific community is still pretty unsure about the
impact individual system features and their weights have on the overall system
performance. In order to overcome this explanatory gap, we first determined
optimal feature configurations using the Sequential Model-based Algorithm
Configuration (SMAC) program and applied its output to a BM25-based search
engine. We then ran an ablation study to systematically assess the individual
contributions of relevant system features: BM25 parameters, query type and
weighting schema, query expansion, stop word filtering, and keyword boosting.
For evaluation, we employed the gold standard data from the three TREC-PM
installments to evaluate the effectiveness of different features using the
commonly shared infNDCG metric.Comment: Accepted for SIGIR2020, 10 page
Streamlined Data Fusion: Unleashing the Power of Linear Combination with Minimal Relevance Judgments
Linear combination is a potent data fusion method in information retrieval
tasks, thanks to its ability to adjust weights for diverse scenarios. However,
achieving optimal weight training has traditionally required manual relevance
judgments on a large percentage of documents, a labor-intensive and expensive
process. In this study, we investigate the feasibility of obtaining
near-optimal weights using a mere 20\%-50\% of relevant documents. Through
experiments on four TREC datasets, we find that weights trained with multiple
linear regression using this reduced set closely rival those obtained with
TREC's official "qrels." Our findings unlock the potential for more efficient
and affordable data fusion, empowering researchers and practitioners to reap
its full benefits with significantly less effort.Comment: 12 pages, 8 figure
TrialMatch: A Transformer Architecture to Match Patients to Clinical Trials
Around 80% of clinical trials fail to meet the patient recruitment requirements, which
not only hinders the market growth but also delays patients’ access to new and effec-
tive treatments. A possible approach is to use Electronic Health Records (EHRs) to help
match patients to clinical trials. Past attempts at achieving this exact goal took place,
but due to a lack of data, they were unsuccessful. In 2021 Text REtrieval Conference
(TREC) introduced the Clinical Trials Track, where participants were challenged with
retrieving relevant clinical trials given the patient’s descriptions simulating admission
notes. Utilizing the track results as a baseline, we tackled the challenge, for this, we re-
sort to Information Retrieval (IR), implementing a pipeline for document ranking where
we explore the different retrieval methods, how to filter the clinical trials based on the
criteria, and reranking with Transformer based models. To tackle the problem, we ex-
plored models pre-trained on the biomedical domain, how to deal with long queries and
documents through query expansion and passage selection, and how to distinguish an
eligible clinical trial from an excluded clinical trial, using techniques such as Named
Entity Recognition (NER) and Clinical Assertion. Our results let to the finding that the
current state-of-the-art Bidirectional Encoder Representations from Transformers (BERT)
bi-encoders outperform the cross-encoders in the mentioned task, whilst proving that
sparse retrieval methods are capable of obtaining competitive outcomes, and to finalize
we showed that the use of the demographic information available can be used to improve
the final result.Cerca de 80% dos ensaios clínicos não satisfazem os requisitos de recrutamento de paci-
entes, o que não só dificulta o crescimento do mercado como também impede o acesso
dos pacientes a novos e eficazes tratamentos. Uma abordagem possível é utilizar os Pron-
tuários Eletrônicos para ajudar a combinar doentes a ensaios clínicos. Tentativas passadas
para alcançar este exato objetivo tiveram lugar, mas devido à falta de dados, não foram
bem sucedidos. Em 2021, a TREC introduziu a Clinical Trials Track, onde os participantes
foram desafiados com a recuperação ensaios clínicos relevantes, dadas as descrições dos
pacientes simulando notas de admissão. Utilizando os resultados da track como base, en-
frentámos o desafio, para isso recorremos à Recuperação de Informação, implementando
uma pipeline para a classificação de documentos onde exploramos os diferentes métodos
de recuperação, como filtrar os ensaios clínicos com base nos critérios, e reclassificação
com modelos baseados no Transformer. Para enfrentar o problema, explorámos modelos
pré-treinados no domínio biomédico, como lidar com longas descrições e documentos,
e como distinguir um ensaio clínico elegível de um ensaio clínico excluído, utilizando
técnicas como Reconhecimento de Entidade Mencionada e Asserção Clínica. Os nossos re-
sultados permitem concluir que os actuais bi-encoders de última geração BERT superam
os cross-encoders BERT na tarefa mencionada, provamos que os métodos de recuperação
esparsos são capazes de obter resultados competitivos, e para finalizar mostramos que
a utilização da informação demográfica disponível pode ser utilizada para melhorar o
resultado fina
Literature-Augmented Clinical Outcome Prediction
We present BEEP (Biomedical Evidence-Enhanced Predictions), a novel approach
for clinical outcome prediction that retrieves patient-specific medical
literature and incorporates it into predictive models. Based on each individual
patient's clinical notes, we train language models (LMs) to find relevant
papers and fuse them with information from notes to predict outcomes such as
in-hospital mortality. We develop methods to retrieve literature based on
noisy, information-dense patient notes, and to augment existing outcome
prediction models with retrieved papers in a manner that maximizes predictive
accuracy. Our approach boosts predictive performance on three important
clinical tasks in comparison to strong recent LM baselines, increasing F1 by up
to 5 points and precision@Top-K by a large margin of over 25%.Comment: To appear in Findings of NAACL 2022. Code available at:
https://github.com/allenai/BEE
Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization
Information retrieval (IR) for precision medicine (PM) often involves looking
for multiple pieces of evidence that characterize a patient case. This
typically includes at least the name of a condition and a genetic variation
that applies to the patient. Other factors such as demographic attributes,
comorbidities, and social determinants may also be pertinent. As such, the
retrieval problem is often formulated as ad hoc search but with multiple facets
(e.g., disease, mutation) that may need to be incorporated. In this paper, we
present a document reranking approach that combines neural query-document
matching and text summarization toward such retrieval scenarios. Our
architecture builds on the basic BERT model with three specific components for
reranking: (a). document-query matching (b). keyword extraction and (c).
facet-conditioned abstractive summarization. The outcomes of (b) and (c) are
used to essentially transform a candidate document into a concise summary that
can be compared with the query at hand to compute a relevance score. Component
(a) directly generates a matching score of a candidate document for a query.
The full architecture benefits from the complementary potential of
document-query matching and the novel document transformation approach based on
summarization along PM facets. Evaluations using NIST's TREC-PM track datasets
(2017--2019) show that our model achieves state-of-the-art performance. To
foster reproducibility, our code is made available here:
https://github.com/bionlproc/text-summ-for-doc-retrieval.Comment: Accepted to EMNLP 2020 Findings as Long Paper (11 page, 4 figures
Utilizing Knowledge Bases In Information Retrieval For Clinical Decision Support And Precision Medicine
Accurately answering queries that describe a clinical case and aim at finding articles in a collection of medical literature requires utilizing knowledge bases in capturing many explicit and latent aspects of such queries. Proper representation of these aspects needs knowledge-based query understanding methods that identify the most important query concepts as well as knowledge-based query reformulation methods that add new concepts to a query. In the tasks of Clinical Decision Support (CDS) and Precision Medicine (PM), the query and collection documents may have a complex structure with different components, such as disease and genetic variants that should be transformed to enable an effective information retrieval. In this work, we propose methods for representing domain-specific queries based on weighted concepts of different types whether exist in the query itself or extracted from the knowledge bases and top retrieved documents. Besides, we propose an optimization framework, which allows unifying query analysis and expansion by jointly determining the importance weights for the query and expansion concepts depending on their type and source. We also propose a probabilistic model to reformulate the query given genetic information in the query and collection documents. We observe significant improvement of retrieval accuracy will be obtained for our proposed methods over state-of-the-art baselines for the tasks of clinical decision support and precision medicine
- …