42 research outputs found
Model Tuning or Prompt Tuning? A Study of Large Language Models for Clinical Concept and Relation Extraction
Objective To develop soft prompt-based learning algorithms for large language
models (LLMs), examine the shape of prompts, prompt-tuning using
frozen/unfrozen LLMs, transfer learning, and few-shot learning abilities.
Methods We developed a soft prompt-based LLM model and compared 4 training
strategies including (1) fine-tuning without prompts; (2) hard-prompt with
unfrozen LLMs; (3) soft-prompt with unfrozen LLMs; and (4) soft-prompt with
frozen LLMs. We evaluated 7 pretrained LLMs using the 4 training strategies for
clinical concept and relation extraction on two benchmark datasets. We
evaluated the transfer learning ability of the prompt-based learning algorithms
in a cross-institution setting. We also assessed the few-shot learning ability.
Results and Conclusion When LLMs are unfrozen, GatorTron-3.9B with soft
prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept
extraction, outperforming the traditional fine-tuning and hard prompt-based
models by 0.6~3.1% and 1.2~2.9%, respectively; GatorTron-345M with soft
prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end
relation extraction, outperforming the other two models by 0.2~2% and
0.6~11.7%, respectively. When LLMs are frozen, small (i.e., 345 million
parameters) LLMs have a big gap to be competitive with unfrozen models; scaling
LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen
LLMs. For cross-institute evaluation, soft prompting with a frozen
GatorTron-8.9B model achieved the best performance. This study demonstrates
that (1) machines can learn soft prompts better than humans, (2) frozen LLMs
have better few-shot learning ability and transfer learning ability to
facilitate muti-institution applications, and (3) frozen LLMs require large
models
On the Impact of Cross-Domain Data on German Language Models
Traditionally, large language models have been either trained on general web
crawls or domain-specific data. However, recent successes of generative large
language models, have shed light on the benefits of cross-domain datasets. To
examine the significance of prioritizing data diversity over quality, we
present a German dataset comprising texts from five domains, along with another
dataset aimed at containing high-quality data. Through training a series of
models ranging between 122M and 750M parameters on both datasets, we conduct a
comprehensive benchmark on multiple downstream tasks. Our findings demonstrate
that the models trained on the cross-domain dataset outperform those trained on
quality data alone, leading to improvements up to over the previous
state-of-the-art. The models are available at
https://huggingface.co/ikim-uk-essenComment: 13 pages, 1 figure, accepted at Findings of the Association for
Computational Linguistics: EMNLP 202
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Recent advances in natural language processing (NLP) can be largely
attributed to the advent of pre-trained language models such as BERT and
RoBERTa. While these models demonstrate remarkable performance on general
datasets, they can struggle in specialized domains such as medicine, where
unique domain-specific terminologies, domain-specific abbreviations, and
varying document structures are common. This paper explores strategies for
adapting these models to domain-specific requirements, primarily through
continuous pre-training on domain-specific data. We pre-trained several German
medical language models on 2.4B tokens derived from translated public English
medical data and 3B tokens of German clinical data. The resulting models were
evaluated on various German downstream tasks, including named entity
recognition (NER), multi-label classification, and extractive question
answering. Our results suggest that models augmented by clinical and
translation-based pre-training typically outperform general domain models in
medical contexts. We conclude that continuous pre-training has demonstrated the
ability to match or even exceed the performance of clinical models trained from
scratch. Furthermore, pre-training on clinical data or leveraging translated
texts have proven to be reliable methods for domain adaptation in medical NLP
tasks.Comment: Accepted at LREC-COLING 202
A Study of Generative Large Language Model for Medical Research and Healthcare
There is enormous enthusiasm and concerns in using large language models
(LLMs) in healthcare, yet current assumptions are all based on general-purpose
LLMs such as ChatGPT. This study develops a clinical generative LLM,
GatorTronGPT, using 277 billion words of mixed clinical and English text with a
GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical
natural language processing for medical research. Synthetic NLP models trained
using GatorTronGPT generated text outperform NLP models trained using
real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best)
scale shows that there is no significant difference in linguistic readability
(p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical
relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that
physicians cannot differentiate them (p < 0.001). This study provides insights
on the opportunities and challenges of LLMs for medical research and
healthcare
Global, regional, and national burden of osteoarthritis, 1990–2020 and projections to 2050: a systematic analysis for the Global Burden of Disease Study 2021
Background
Osteoarthritis is the most common form of arthritis in adults, characterised by chronic pain and loss of mobility. Osteoarthritis most frequently occurs after age 40 years and prevalence increases steeply with age. WHO has designated 2021–30 the decade of healthy ageing, which highlights the need to address diseases such as osteoarthritis, which strongly affect functional ability and quality of life. Osteoarthritis can coexist with, and negatively effect, other chronic conditions. Here we estimate the burden of hand, hip, knee, and other sites of osteoarthritis across geographies, age, sex, and time, with forecasts of prevalence to 2050.
Methods
In this systematic analysis for the Global Burden of Disease Study, osteoarthritis prevalence in 204 countries and territories from 1990 to 2020 was estimated using data from population-based surveys from 26 countries for knee osteoarthritis, 23 countries for hip osteoarthritis, 42 countries for hand osteoarthritis, and US insurance claims for all of the osteoarthritis sites, including the other types of osteoarthritis category. The reference case definition was symptomatic, radiographically confirmed osteoarthritis. Studies using alternative definitions from the reference case definition (for example self-reported osteoarthritis) were adjusted to reference using regression models. Osteoarthritis severity distribution was obtained from a pooled meta-analysis of sources using the Western Ontario and McMaster Universities Arthritis Index. Final prevalence estimates were multiplied by disability weights to calculate years lived with disability (YLDs). Prevalence was forecast to 2050 using a mixed-effects model.
Findings
Globally, 595 million (95% uncertainty interval 535–656) people had osteoarthritis in 2020, equal to 7·6% (95% UI 6·8–8·4) of the global population, and an increase of 132·2% (130·3–134·1) in total cases since 1990. Compared with 2020, cases of osteoarthritis are projected to increase 74·9% (59·4–89·9) for knee, 48·6% (35·9–67·1) for hand, 78·6% (57·7–105·3) for hip, and 95·1% (68·1–135·0) for other types of osteoarthritis by 2050. The global age-standardised rate of YLDs for total osteoarthritis was 255·0 YLDs (119·7–557·2) per 100 000 in 2020, a 9·5% (8·6–10·1) increase from 1990 (233·0 YLDs per 100 000, 109·3–510·8). For adults aged 70 years and older, osteoarthritis was the seventh ranked cause of YLDs. Age-standardised prevalence in 2020 was more than 5·5% in all world regions, ranging from 5677·4 (5029·8–6318·1) per 100 000 in southeast Asia to 8632·7 (7852·0–9469·1) per 100 000 in high-income Asia Pacific. Knee was the most common site for osteoarthritis. High BMI contributed to 20·4% (95% UI –1·7 to 36·6) of osteoarthritis. Potentially modifiable risk factors for osteoarthritis such as recreational injury prevention and occupational hazards have not yet been explored in GBD modelling.
Interpretation
Age-standardised YLDs attributable to osteoarthritis are continuing to rise and will lead to substantial increases in case numbers because of population growth and ageing, and because there is no effective cure for osteoarthritis. The demand on health systems for care of patients with osteoarthritis, including joint replacements, which are highly effective for late stage osteoarthritis in hips and knees, will rise in all regions, but might be out of reach and lead to further health inequity for individuals and countries unable to afford them. Much more can and should be done to prevent people getting to that late stage
Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021
This online publication has been
corrected. The corrected version
first appeared at thelancet.com
on September 28, 2023BACKGROUND : Diabetes is one of the leading causes of death and disability worldwide, and affects people regardless of country, age group, or sex. Using the most recent evidentiary and analytical framework from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD), we produced location-specific, age-specific, and sex-specific estimates of diabetes prevalence and burden from 1990 to 2021, the proportion of type 1 and type 2 diabetes in 2021, the proportion of the type 2 diabetes burden attributable to selected risk factors, and projections of diabetes prevalence through 2050. METHODS : Estimates of diabetes prevalence and burden were computed in 204 countries and territories, across 25 age groups, for males and females separately and combined; these estimates comprised lost years of healthy life, measured in disability-adjusted life-years (DALYs; defined as the sum of years of life lost [YLLs] and years lived with disability [YLDs]). We used the Cause of Death Ensemble model (CODEm) approach to estimate deaths due to diabetes, incorporating 25 666 location-years of data from vital registration and verbal autopsy reports in separate total (including both type 1 and type 2 diabetes) and type-specific models. Other forms of diabetes, including gestational and monogenic diabetes, were not explicitly modelled. Total and type 1 diabetes prevalence was estimated by use of a Bayesian meta-regression modelling tool, DisMod-MR 2.1, to analyse 1527 location-years of data from the scientific literature, survey microdata, and insurance claims; type 2 diabetes estimates were computed by subtracting type 1 diabetes from total estimates. Mortality and prevalence estimates, along with standard life expectancy and disability weights, were used to calculate YLLs, YLDs, and DALYs. When appropriate, we extrapolated estimates to a hypothetical population with a standardised age structure to allow comparison in populations with different age structures. We used the comparative risk assessment framework to estimate the risk-attributable type 2 diabetes burden for 16 risk factors falling under risk categories including environmental and occupational factors, tobacco use, high alcohol use, high body-mass index (BMI), dietary factors, and low physical activity. Using a regression framework, we forecast type 1 and type 2 diabetes prevalence through 2050 with Socio-demographic Index (SDI) and high BMI as predictors, respectively. FINDINGS : In 2021, there were 529 million (95% uncertainty interval [UI] 500–564) people living with diabetes worldwide, and the global age-standardised total diabetes prevalence was 6·1% (5·8–6·5). At the super-region level, the highest age-standardised rates were observed in north Africa and the Middle East (9·3% [8·7–9·9]) and, at the regional level, in Oceania (12·3% [11·5–13·0]). Nationally, Qatar had the world’s highest age-specific prevalence of diabetes, at 76·1% (73·1–79·5) in individuals aged 75–79 years. Total diabetes prevalence—especially among older adults—primarily reflects type 2 diabetes, which in 2021 accounted for 96·0% (95·1–96·8) of diabetes cases and 95·4% (94·9–95·9) of diabetes DALYs worldwide. In 2021, 52·2% (25·5–71·8) of global type 2 diabetes DALYs were attributable to high BMI. The contribution of high BMI to type 2 diabetes DALYs rose by 24·3% (18·5–30·4) worldwide between 1990 and 2021. By 2050, more than 1·31 billion (1·22–1·39) people are projected to have diabetes, with expected age-standardised total diabetes prevalence rates greater than 10% in two super-regions: 16·8% (16·1–17·6) in north Africa and the Middle East and 11·3% (10·8–11·9) in Latin America and Caribbean. By 2050, 89 (43·6%) of 204 countries and territories will have an age-standardised rate greater than 10%. INTERPRETATION : Diabetes remains a substantial public health issue. Type 2 diabetes, which makes up the bulk of diabetes cases, is largely preventable and, in some cases, potentially reversible if identified and managed early in the disease course. However, all evidence indicates that diabetes prevalence is increasing worldwide, primarily due to a rise in obesity caused by multiple factors. Preventing and controlling type 2 diabetes remains an ongoing challenge. It is essential to better understand disparities in risk factor profiles and diabetes burden across populations, to inform strategies to successfully control diabetes risk factors within the context of multiple and complex drivers.Bill & Melinda Gates Foundation.http://www.thelancet.comam2024School of Health Systems and Public Health (SHSPH)SDG-03:Good heatlh and well-bein
Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning
The progress in natural language processing (NLP) using large language models
(LLMs) has greatly improved patient information extraction from clinical
narratives. However, most methods based on the fine-tuning strategy have
limited transfer learning ability for cross-domain applications. This study
proposed a novel approach that employs a soft prompt-based learning
architecture, which introduces trainable prompts to guide LLMs toward desired
outputs. We examined two types of LLM architectures, including encoder-only
GatorTron and decoder-only GatorTronGPT, and evaluated their performance for
the extraction of social determinants of health (SDoH) using a
cross-institution dataset from the 2022 n2c2 challenge and a cross-disease
dataset from the University of Florida (UF) Health. The results show that
decoder-only LLMs with prompt tuning achieved better performance in
cross-domain applications. GatorTronGPT achieved the best F1 scores for both
datasets, outperforming traditional fine-tuned GatorTron by 8.9% and 21.8% in a
cross-institution setting, and 5.5% and 14.5% in a cross-disease setting
A large language model for electronic health records
Abstract There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og