Search CORE

131 research outputs found

Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges

Author: Ahsan Hiba
Amir Silvio
Kim Jisoo
McInerney Denis Jered
Potter Christopher
Wallace Byron C.
Young Geoffrey
Publication venue
Publication date: 08/09/2023
Field of study

Unstructured Electronic Health Record (EHR) data often contains critical information complementary to imaging data that would inform radiologists' diagnoses. However, time constraints and the large volume of notes frequently associated with individual patients renders manual perusal of such data to identify relevant evidence infeasible in practice. Modern Large Language Models (LLMs) provide a flexible means of interacting with unstructured EHR data, and may provide a mechanism to efficiently retrieve and summarize unstructured evidence relevant to a given query. In this work, we propose and evaluate an LLM (Flan-T5 XXL) for this purpose. Specifically, in a zero-shot setting we task the LLM to infer whether a patient has or is at risk of a particular condition; if so, we prompt the model to summarize the supporting evidence. Enlisting radiologists for manual evaluation, we find that this LLM-based approach provides outputs consistently preferred to a standard information retrieval baseline, but we also highlight the key outstanding challenge: LLMs are prone to hallucinating evidence. However, we provide results indicating that model confidence in outputs might indicate when LLMs are hallucinating, potentially providing a means to address this

arXiv.org e-Print Archive

Utilizing ChatGPT to Enhance Clinical Trial Enrollment

Author: Kasela Pranav
Pasi Gabriella
Peikos Georgios
Symeonidis Symeon
Publication venue
Publication date: 03/06/2023
Field of study

Clinical trials are a critical component of evaluating the effectiveness of new medical interventions and driving advancements in medical research. Therefore, timely enrollment of patients is crucial to prevent delays or premature termination of trials. In this context, Electronic Health Records (EHRs) have emerged as a valuable tool for identifying and enrolling eligible participants. In this study, we propose an automated approach that leverages ChatGPT, a large language model, to extract patient-related information from unstructured clinical notes and generate search queries for retrieving potentially eligible clinical trials. Our empirical evaluation, conducted on two benchmark retrieval collections, shows improved retrieval performance compared to existing approaches when several general-purposed and task-specific prompts are used. Notably, ChatGPT-generated queries also outperform human-generated queries in terms of retrieval performance. These findings highlight the potential use of ChatGPT to enhance clinical trial enrollment while ensuring the quality of medical service and minimizing direct risks to patients.Comment: Under Revie

arXiv.org e-Print Archive

Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health

Author: Chen Qingyu
Chen Xiuying
Comeau Donald C.
Gao Xin
Islamaj Rezarta
Jin Qiao
Kapoor Aadit
Kim Won
Lai Po-Ting
Lu Zhiyong
Tian Shubo
Yang Yifan
Yeganova Lana
Zhu Qingqing
Publication venue
Publication date: 15/06/2023
Field of study

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized the biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this first-of-its-kind survey can provide a comprehensive overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health

arXiv.org e-Print Archive

Enriching information extraction pipelines in clinical decision support systems

Author: Almeida João Rafael
Publication venue
Publication date: 01/01/2023
Field of study

Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Os estudos sanitarios de múltiples centros son importantes para aumentar a repercusión dos resultados da investigación médica debido ao número de suxeitos que poden participar neles. Para simplificar a execución destes estudos, o proceso de intercambio de datos debería ser sinxelo, por exemplo, mediante o uso de bases de datos interoperables. Con todo, a consecución desta interoperabilidade segue sendo un tema de investigación en curso, sobre todo debido aos problemas de gobernanza e privacidade dos datos. Na primeira fase deste traballo, propoñemos varias metodoloxías para optimizar os procesos de estandarización das bases de datos sanitarias. Este traballo centrouse na estandarización de fontes de datos heteroxéneas nun esquema de datos estándar, concretamente o OMOP CDM, que foi desenvolvido e promovido pola comunidade OHDSI. Validamos a nosa proposta utilizando conxuntos de datos de pacientes con enfermidade de Alzheimer procedentes de distintas institucións. Na seguinte etapa, co obxectivo de enriquecer a información almacenada nas bases de datos de OMOP CDM, investigamos solucións para extraer conceptos clínicos de narrativas non estruturadas, utilizando técnicas de recuperación de información e de procesamento da linguaxe natural. A validación realizouse a través de conxuntos de datos proporcionados en desafíos científicos, concretamente no National NLP Clinical Challenges(n2c2). Na etapa final, propuxémonos simplificar a execución de protocolos de estudos provenientes de múltiples centros, propoñendo solucións novas para perfilar, publicar e facilitar o descubrimento de bases de datos. Algunhas das solucións desenvolvidas están a utilizarse actualmente en tres proxectos europeos destinados a crear redes federadas de bases de datos de saúde en toda Europa.[Resumen] Los estudios sanitarios de múltiples centros son importantes para aumentar la repercusión de los resultados de la investigación médica debido al número de sujetos que pueden participar en ellos. Para simplificar la ejecución de estos estudios, el proceso de intercambio de datos debería ser sencillo, por ejemplo, mediante el uso de bases de datos interoperables. Sin embargo, la consecución de esta interoperabilidad sigue siendo un tema de investigación en curso, sobre todo debido a los problemas de gobernanza y privacidad de los datos. En la primera fase de este trabajo, proponemos varias metodologías para optimizar los procesos de estandarización de las bases de datos sanitarias. Este trabajo se centró en la estandarización de fuentes de datos heterogéneas en un esquema de datos estándar, concretamente el OMOP CDM, que ha sido desarrollado y promovido por la comunidad OHDSI. Validamos nuestra propuesta utilizando conjuntos de datos de pacientes con enfermedad de Alzheimer procedentes de distintas instituciones. En la siguiente etapa, con el objetivo de enriquecer la información almacenada en las bases de datos de OMOP CDM, hemos investigado soluciones para extraer conceptos clínicos de narrativas no estructuradas, utilizando técnicas de recuperación de información y de procesamiento del lenguaje natural. La validación se realizó a través de conjuntos de datos proporcionados en desafíos científicos, concretamente en el National NLP Clinical Challenges (n2c2). En la etapa final, nos propusimos simplificar la ejecución de protocolos de estudios provenientes de múltiples centros, proponiendo soluciones novedosas para perfilar, publicar y facilitar el descubrimiento de bases de datos. Algunas de las soluciones desarrolladas se están utilizando actualmente en tres proyectos europeos destinados a crear redes federadas de bases de datos de salud en toda Europa.[Abstract] Multicentre health studies are important to increase the impact of medical research findings due to the number of subjects that they are able to engage. To simplify the execution of these studies, the data-sharing process should be effortless, for instance, through the use of interoperable databases. However, achieving this interoperability is still an ongoing research topic, namely due to data governance and privacy issues. In the first stage of this work, we propose several methodologies to optimise the harmonisation pipelines of health databases. This work was focused on harmonising heterogeneous data sources into a standard data schema, namely the OMOP CDM which has been developed and promoted by the OHDSI community. We validated our proposal using data sets of Alzheimer’s disease patients from distinct institutions. In the following stage, aiming to enrich the information stored in OMOP CDM databases, we have investigated solutions to extract clinical concepts from unstructured narratives, using information retrieval and natural language processing techniques. The validation was performed through datasets provided in scientific challenges, namely in the National NLP Clinical Challenges (n2c2). In the final stage, we aimed to simplify the protocol execution of multicentre studies, by proposing novel solutions for profiling, publishing and facilitating the discovery of databases. Some of the developed solutions are currently being used in three European projects aiming to create federated networks of health databases across Europe

Repositorio da Universidade da Coruña

Natural Language Processing in Electronic Health Records in Relation to Healthcare Decision-making: A Systematic Review

Author: Barua Prabal Datta
D Ph.
Higgins Niall
Hossain Elias
Pisani Anthony R.
Rana Rajib
Soar Jeffrey
Turner} Kathryn
Publication venue
Publication date: 22/06/2023
Field of study

Background: Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. Methodology: After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: 1) medical note classification, 2) clinical entity recognition, 3) text summarisation, 4) deep learning (DL) and transfer learning architecture, 5) information extraction, 6) Medical language translation and 7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Result and Discussion: EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. Conclusion: We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification

arXiv.org e-Print Archive

Recommended from our members

Patient Record Summarization Through Joint Phenotype Learning and Interactive Visualization

Author: Levy-Fix Gal
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Complex patient are becoming more and more of a challenge to the health care system given the amount of care they require and the amount of documentation needed to keep track of their state of health and treatment. Record keeping using the EHR makes this easier but mounting amounts of patient data also means that clinicians are faced with information overload. Information overload has been shown to have deleterious effects on care, with increased safety concerns due to missed information. Patient record summarization has been a promising mitigator for information overload. Subsequently, a lot of research has been dedicated to record summarization since the introduction of EHRs. In this dissertation we examine whether unsupervised inference methods can derive patient problem-oriented summaries, that are robust to different patients. By grounding our experiments with HIV patients we leverage the data of a group of patients that are similar in that they share one common disease (HIV) but also exhibit complex histories of diverse comorbidities. Using a user-centered, iterative design process, we design an interactive, longitudinal patient record summarization tool, that leverages automated inferences about the patient's problems. We find that unsupervised, joint learning of problems using correlated topic models, adapted to handle the multiple data types (structured and unstructured) of the EHR, is successful in identifying the salient problems of complex patients. Utilizing interactive visualization that exposes inference results to users enables them to make sense of a patient's problems over time and to answer questions about a patient more accurately and faster than using the EHR alone

Columbia University Academic Commons

Automated methods to extract patient new information from clinical notes in electronic health record systems

Author: Zhang Rui
Publication venue
Publication date: 01/11/2013
Field of study

University of Minnesota Ph.D. dissertation. November 2013. Major: Health Informatics. Advisor: Serguei Pakhomov. 1 computer file (PDF); xii, 102 pages.The widespread adoption of Electronic Health Record (EHR) has resulted in rapid text proliferation within clinical care. Clinicians' use of copying and pasting functions in EHR systems further compounds this by creating a large amount of redundant clinical information in clinical documents. A mixture of redundant information (especially outdated and incorrect information) and new information in a single clinical note increases clinicians' cognitive burden and results in decision-making difficulties. Moreover, replicated erroneous information can potentially cause risks to patient safety. However, automated methods to identify redundant or relevant new information in clinical texts have not been extensively investigated. The overarching goal of this research is to develop and evaluate automated methods to identify new and clinically relevant information in clinical notes using expert-derived reference standards. Modified global alignment methods were adapted to investigate the pattern of redundancy in individual longitudinal clinical notes as well as a larger group of patient clinical notes. Statistical language models were also developed to identify new and clinically relevant information in clinical notes. Relevant new information identified by automated methods will be highlighted in clinical notes to provide visualization cues to clinicians. New information proportion (NIP) was used to indicate the quantity of new information in each note and also navigate clinician notes with more new information. Classifying semantic types of new information further provides clinicians with specific types of new information that they are interested in finding. The techniques developed in this research can be incorporated into production EHR systems and could potentially aid clinicians in finding and synthesizing new information in a note more purposely, and could finally improve the efficiency of healthcare delivery

University of Minnesota Digital Conservancy

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Author: Amor Benjamin
Austin Christopher P
Bennett Tellen D
Blacketer Clair
Bradford Robert L
Chute Christopher G
Cimino James J
Clark Marshall
Colmenares Evan W
Eichmann David A
Francis Patricia A
Gabriel Davera
Gersing Ken R
Girvin Andrew T
Graves Alexis
Guinney Justin
Haendel Melissa A
Hemadri Raju
Hong Stephanie S
Hripscak George
Jiao Dazhi
Kibbe Warren A
Klann Jeffrey G
Kostka Kristin
Kurilla Michael G
Lee Adam M
Lehmann Harold P
Lingrey Lora
Manna Amin
Michael Sam G
Miller Robert T
Morris Michele
Murphy Shawn N
Natarajan Karthik
Palchuk Matvey B
Payne Philip R O
Pfaff Emily R
Portilla Lili M
Qureshi Nabeel
Robinson Peter N
Rutter Joni L
Saltz Joel H
Sheikh Usman
Solbrig Harold
Spratt Heidi
Suver Christine
Visweswaran Shyam
Walden Anita
Walters Kellie M
Weber Griffin M
Wilbanks John
Wilcox Adam B
Williams Andrew E
Wu Chunlei
Zhang Xiaohan Tanner
Zhu Richard L
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/03/2021
Field of study

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19

The Jackson Laboratory: The Mouseion at the JAXlibrary