15,287 research outputs found
The Lucene for Information Access and Retrieval Research (LIARR) Workshop at SIGIR 2017
As an empirical discipline, information access and retrieval research requires substantial software infrastructure to index and search large collections. This workshop is motivated by the desire to better align information retrieval research with the practice of building search applications from the perspective of open-source information retrieval systems. Our goal is to promote the use of Lucene for information access and retrieval research
Med-HALT: Medical Domain Hallucination Test for Large Language Models
This research paper focuses on the challenges posed by hallucinations in
large language models (LLMs), particularly in the context of the medical
domain. Hallucination, wherein these models generate plausible yet unverified
or incorrect information, can have serious consequences in healthcare
applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain
Hallucination Test), designed specifically to evaluate and reduce
hallucinations. Med-HALT provides a diverse multinational dataset derived from
medical examinations across various countries and includes multiple innovative
testing modalities. Med-HALT includes two categories of tests reasoning and
memory-based hallucination tests, designed to assess LLMs's problem-solving and
information retrieval abilities.
Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2,
MPT, and Falcon, revealing significant differences in their performance. The
paper provides detailed insights into the dataset, promoting transparency and
reproducibility. Through this work, we aim to contribute to the development of
safer and more reliable language models in healthcare. Our benchmark can be
found at medhalt.github.ioComment: Accepted at EMNLP 2023(The SIGNLL Conference on Computational Natural
Language Learning
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
- …