Search CORE

30 research outputs found

Topics in machine learning for biomedical literature analysis and text retrieval

Author: Islamaj Doğan Rezarta
Yeganova Lana
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

PubMed Central

MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

Author: Chen Qingyu
Comeau Donald C.
Jin Qiao
Kim Won
Lu Zhiyong
Wilbur W. John
Yeganova Lana
Publication venue
Publication date: 03/10/2023
Field of study

Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely-integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks.Comment: The MedCPT code and API are available at https://github.com/ncbi/MedCP

arXiv.org e-Print Archive

Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health

Author: Chen Qingyu
Chen Xiuying
Comeau Donald C.
Gao Xin
Islamaj Rezarta
Jin Qiao
Kapoor Aadit
Kim Won
Lai Po-Ting
Lu Zhiyong
Tian Shubo
Yang Yifan
Yeganova Lana
Zhu Qingqing
Publication venue
Publication date: 15/06/2023
Field of study

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized the biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this first-of-its-kind survey can provide a comprehensive overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health

arXiv.org e-Print Archive

Machine learning with naturally labeled data for identifying abbreviation definitions

Author: A Schwartz
C Kuo
D Nadeau
Donald C Comeau
H Liu
H Yu
J Pustejovsky
L Smith
Lana Yeganova
N Okazaki
R Islamaj
S Sohn
T Zhang
W John Wilbur
W Zhou
Y Park
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages

Author: Bawden Rachel
de Viñaspre Olatz Perez
Di Nunzio Giorgio Maria
Grozea Christian
Jauregi Unanue Iñigo
Jimeno Yepes Antonio
Mah Nancy
Martinez David
Neves Mariana
Névéol Aurélie
Oronoz Maite
Piccardi Massimo
Roller Roland
Siu Amy
Thomas Philippe
Vezzani Federica
Vicente Navarro Maika
Wiemann Dina
Yeganova Lana
Publication venue
Publication date: 01/01/2020
Field of study

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years

HAL Descartes

Edinburgh Research Explorer

Hal-Diderot

Archivio istituzionale della ricerca - Università di Padova

Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports

Author: Bawden Rachel
Di Nunzio Giorgio Maria
Farré-Maduell Eulàlia
Grozea Cristian
Gérardin Christel
Jimeno Yepes Antonio
Johan Estrada Darryl
Krallinger Martin
Lima-López Salvador
Neves Mariana
Névéol Aurélie
Roller Roland
Siu Amy
Thomas Philippe
Vezzani Federica
Vicente Navarro Maika
Wiemann Dina
Yeganova Lana
Publication venue: HAL CCSD
Publication date: 07/12/2022
Field of study

International audienceIn the seventh edition of the WMT Biomedical Task, we addressed a total of seven language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This year’s test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previ- ous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams

INRIA a CCSD electronic archive server

Topics in machine learning for biomedical literature analysis and text retrieval

Author: Islamaj Doğan Rezarta
Yeganova Lana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2012
Field of study

Directory of Open Access Journals