11,583 research outputs found
Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health
ChatGPT has drawn considerable attention from both the general public and
domain experts with its remarkable text generation capabilities. This has
subsequently led to the emergence of diverse applications in the field of
biomedicine and health. In this work, we examine the diverse applications of
large language models (LLMs), such as ChatGPT, in biomedicine and health.
Specifically we explore the areas of biomedical information retrieval, question
answering, medical text summarization, information extraction, and medical
education, and investigate whether LLMs possess the transformative power to
revolutionize these tasks or whether the distinct complexities of biomedical
domain presents unique challenges. Following an extensive literature survey, we
find that significant advances have been made in the field of text generation
tasks, surpassing the previous state-of-the-art methods. For other
applications, the advances have been modest. Overall, LLMs have not yet
revolutionized the biomedicine, but recent rapid progress indicates that such
methods hold great potential to provide valuable means for accelerating
discovery and improving health. We also find that the use of LLMs, like
ChatGPT, in the fields of biomedicine and health entails various risks and
challenges, including fabricated information in its generated responses, as
well as legal and privacy concerns associated with sensitive patient data. We
believe this first-of-its-kind survey can provide a comprehensive overview to
biomedical researchers and healthcare practitioners on the opportunities and
challenges associated with using ChatGPT and other LLMs for transforming
biomedicine and health
Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application
The introduction of computerized medical records in hospitals has reduced
burdensome operations like manual writing and information fetching. However,
the data contained in medical records are still far underutilized, primarily
because extracting them from unstructured textual medical records takes time
and effort. Information Extraction, a subfield of Natural Language Processing,
can help clinical practitioners overcome this limitation, using automated
text-mining pipelines. In this work, we created the first Italian
neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to
develop a Large Language Model for this task. Moreover, we conducted several
experiments with three external independent datasets to implement an effective
multicenter model, with overall F1-score 84.77%, Precision 83.16%, Recall
86.44%. The lessons learned are: (i) the crucial role of a consistent
annotation process and (ii) a fine-tuning strategy that combines classical
methods with a "few-shot" approach. This allowed us to establish methodological
guidelines that pave the way for future implementations in this field and allow
Italian hospitals to tap into important research opportunities
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Large language models (LLMs) are a special class of pretrained language
models obtained by scaling model size, pretraining corpus and computation.
LLMs, because of their large size and pretraining on large volumes of text
data, exhibit special abilities which allow them to achieve remarkable
performances without any task-specific training in many of the natural language
processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the
popularity of LLMs is increasing exponentially after the introduction of models
like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models,
including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With
the ever-rising popularity of GLLMs, especially in the research community,
there is a strong need for a comprehensive survey which summarizes the recent
research progress in multiple dimensions and can guide the research community
with insightful future research directions. We start the survey paper with
foundation concepts like transformers, transfer learning, self-supervised
learning, pretrained language models and large language models. We then present
a brief overview of GLLMs and discuss the performances of GLLMs in various
downstream tasks, specific domains and multiple languages. We also discuss the
data labelling and data augmentation abilities of GLLMs, the robustness of
GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with
multiple insightful future research directions. To summarize, this
comprehensive survey paper will serve as a good resource for both academic and
industry people to stay updated with the latest research related to GPT-3
family large language models.Comment: Preprint under review, 58 page
Enhance Representation Learning of Clinical Narrative with Neural Networks for Clinical Predictive Modeling
Medicine is undergoing a technological revolution. Understanding human health from clinical data has major challenges from technical and practical perspectives, thus prompting methods that understand large, complex, and noisy data. These methods are particularly necessary for natural language data from clinical narratives/notes, which contain some of the richest information on a patient. Meanwhile, deep neural networks have achieved superior performance in a wide variety of natural language processing (NLP) tasks because of their capacity to encode meaningful but abstract representations and learn the entire task end-to-end. In this thesis, I investigate representation learning of clinical narratives with deep neural networks through a number of tasks ranging from clinical concept extraction, clinical note modeling, and patient-level language representation. I present methods utilizing representation learning with neural networks to support understanding of clinical text documents.
I first introduce the notion of representation learning from natural language processing and patient data modeling. Then, I investigate word-level representation learning to improve clinical concept extraction from clinical notes. I present two works on learning word representations and evaluate them to extract important concepts from clinical notes. The first study focuses on cancer-related information, and the second study evaluates shared-task data. The aims of these two studies are to automatically extract important entities from clinical notes. Next, I present a series of deep neural networks to encode hierarchical, longitudinal, and contextual information for modeling a series of clinical notes. I also evaluate the models by predicting clinical outcomes of interest, including mortality, length of stay, and phenotype predictions. Finally, I propose a novel representation learning architecture to develop a generalized and transferable language representation at the patient level. I also identify pre-training tasks appropriate for constructing a generalizable language representation. The main focus is to improve predictive performance of phenotypes with limited data, a challenging task due to a lack of data.
Overall, this dissertation addresses issues in natural language processing for medicine, including clinical text classification and modeling. These studies show major barriers to understanding large-scale clinical notes. It is believed that developing deep representation learning methods for distilling enormous amounts of heterogeneous data into patient-level language representations will improve evidence-based clinical understanding. The approach to solving these issues by learning representations could be used across clinical applications despite noisy data. I conclude that considering different linguistic components in natural language and sequential information between clinical events is important. Such results have implications beyond the immediate context of predictions and further suggest future directions for clinical machine learning research to improve clinical outcomes. This could be a starting point for future phenotyping methods based on natural language processing that construct patient-level language representations to improve clinical predictions. While significant progress has been made, many open questions remain, so I will highlight a few works to demonstrate promising directions
Informed Named Entity Recognition Decoding for Generative Language Models
Ever-larger language models with ever-increasing capabilities are by now
well-established text processing tools. Alas, information extraction tasks such
as named entity recognition are still largely unaffected by this progress as
they are primarily based on the previous generation of encoder-only transformer
models. Here, we propose a simple yet effective approach, Informed Named Entity
Recognition Decoding (iNERD), which treats named entity recognition as a
generative process. It leverages the language understanding capabilities of
recent generative models in a future-proof manner and employs an informed
decoding scheme incorporating the restricted nature of information extraction
into open-ended text generation, improving performance and eliminating any risk
of hallucinations. We coarse-tune our model on a merged named entity corpus to
strengthen its performance, evaluate five generative language models on eight
named entity recognition datasets, and achieve remarkable results, especially
in an environment with an unknown entity class set, demonstrating the
adaptability of the approach.Comment: 12 pages, 2 figures, 4 table
Text Classification
There is an abundance of text data in this world but most of it is raw. We need to extract information from this data to make use of it. One way to extract this information from raw text is to apply informative labels drawn from a pre-defined fixed set i.e. Text Classification. In this thesis, we focus on the general problem of text classification, and work towards solving challenges associated to binary/multi-class/multi-label classification. More specifically, we deal with the problem of (i) Zero-shot labels during testing; (ii) Active learning for text screening; (iii) Multi-label classification under low supervision; (iv) Structured label space; (v) Classifying pairs of words in raw text i.e. Relation Extraction. For (i), we use a zero-shot classification model that utilizes independently learned semantic embeddings. Regarding (ii), we propose a novel active learning algorithm that reduces problem of bias in naive active learning algorithms. For (iii), we propose neural candidate-selector architecture that starts from a set of high-recall candidate labels to obtain high-precision predictions. In the case of (iv), we proposed an attention based neural tree decoder that recursively decodes an abstract into the ontology tree. For (v), we propose using second-order relations that are derived by explicitly connecting pairs of words via context token(s) for improved relation extraction. We use a wide variety of both traditional and deep machine learning tools. More specifically, we used traditional machine learning models like multi-valued linear regression and logistic regression for (i, ii), deep convolutional neural networks for (iii), recurrent neural networks for (iv) and transformer networks for (v)
- …