30 research outputs found
MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval
Information retrieval (IR) is essential in biomedical knowledge acquisition
and clinical decision support. While recent progress has shown that language
model encoders perform better semantic retrieval, training such models requires
abundant query-article annotations that are difficult to obtain in biomedicine.
As a result, most biomedical IR systems only conduct lexical matching. In
response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained
Transformer model for zero-shot semantic IR in biomedicine. To train MedCPT, we
collected an unprecedented scale of 255 million user click logs from PubMed.
With such data, we use contrastive learning to train a pair of
closely-integrated retriever and re-ranker. Experimental results show that
MedCPT sets new state-of-the-art performance on six biomedical IR tasks,
outperforming various baselines including much larger models such as
GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical
article and sentence representations for semantic evaluations. As such, MedCPT
can be readily applied to various real-world biomedical IR tasks.Comment: The MedCPT code and API are available at
https://github.com/ncbi/MedCP
Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health
ChatGPT has drawn considerable attention from both the general public and
domain experts with its remarkable text generation capabilities. This has
subsequently led to the emergence of diverse applications in the field of
biomedicine and health. In this work, we examine the diverse applications of
large language models (LLMs), such as ChatGPT, in biomedicine and health.
Specifically we explore the areas of biomedical information retrieval, question
answering, medical text summarization, information extraction, and medical
education, and investigate whether LLMs possess the transformative power to
revolutionize these tasks or whether the distinct complexities of biomedical
domain presents unique challenges. Following an extensive literature survey, we
find that significant advances have been made in the field of text generation
tasks, surpassing the previous state-of-the-art methods. For other
applications, the advances have been modest. Overall, LLMs have not yet
revolutionized the biomedicine, but recent rapid progress indicates that such
methods hold great potential to provide valuable means for accelerating
discovery and improving health. We also find that the use of LLMs, like
ChatGPT, in the fields of biomedicine and health entails various risks and
challenges, including fabricated information in its generated responses, as
well as legal and privacy concerns associated with sensitive patient data. We
believe this first-of-its-kind survey can provide a comprehensive overview to
biomedical researchers and healthcare practitioners on the opportunities and
challenges associated with using ChatGPT and other LLMs for transforming
biomedicine and health
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages
Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years
Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports
International audienceIn the seventh edition of the WMT Biomedical Task, we addressed a total of seven language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This yearâs test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previ- ous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams