28 research outputs found

    Considerations for health care institutions training large language models on electronic health records

    Full text link
    Large language models (LLMs) like ChatGPT have excited scientists across fields; in medicine, one source of excitement is the potential applications of LLMs trained on electronic health record (EHR) data. But there are tough questions we must first answer if health care institutions are interested in having LLMs trained on their own data; should they train an LLM from scratch or fine-tune it from an open-source model? For healthcare institutions with a predefined budget, what are the biggest LLMs they can afford? In this study, we take steps towards answering these questions with an analysis on dataset sizes, model sizes, and costs for LLM training using EHR data. This analysis provides a framework for thinking about these questions in terms of data scale, compute scale, and training budgets

    Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

    Full text link
    Recent advances in large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates the performance of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical tasks beyond question-answering. Because no patient data can be passed to the OpenAI API public interface, we evaluated model performance with over 10000 samples as proxies for two fundamental tasks in the clinical domain - classification and reasoning. The first task is classifying whether statements of clinical and policy recommendations in scientific literature constitute health advice. The second task is causal relation detection from the biomedical literature. We compared LLMs with simpler models, such as bag-of-words (BoW) with logistic regression, and fine-tuned BioBERT models. Despite the excitement around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks remained the best strategy. The simple BoW model performed on par with the most complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie

    Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy

    Full text link
    Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet remain under-studied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for three esophagitis classification tasks: Task 1) presence of esophagitis, Task 2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by over 2% for all tasks. Silver-labeled data improved the macro-F1 by over 3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning. To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinic notes. The promising performance provides proof-of-concept for NLP-based automated detailed toxicity monitoring in expanded domains.Comment: 17 pages, 6 tables, 1figure, submiting to JCO-CCI for revie

    Large Language Models to Identify Social Determinants of Health in Electronic Health Records

    Full text link
    Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated. The study also experimented with synthetic data generation and assessed for algorithmic bias. Our best-performing models were fine-tuned Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The benefit of augmenting fine-tuning with synthetic data varied across model architecture and size, with smaller Flan-T5 models (base and large) showing the greatest improvements in performance (delta F1 +0.12 to +0.23). Model performance was similar on the in-hospital system dataset but worse on the MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models for both tasks. These fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can effectively extracted SDoH information from clinic notes, performing better compare to GPT zero- and few-shot settings. These models could enhance real-world evidence on SDoH and aid in identifying patients needing social support.Comment: 38 pages, 5 figures, 5 tables in main, submitted for revie

    The impact of responding to patient messages with large language model assistance

    Full text link
    Documentation burden is a major contributor to clinician burnout, which is rising nationally and is an urgent threat to our ability to care for patients. Artificial intelligence (AI) chatbots, such as ChatGPT, could reduce clinician burden by assisting with documentation. Although many hospitals are actively integrating such systems into electronic medical record systems, AI chatbots utility and impact on clinical decision-making have not been studied for this intended use. We are the first to examine the utility of large language models in assisting clinicians draft responses to patient questions. In our two-stage cross-sectional study, 6 oncologists responded to 100 realistic synthetic cancer patient scenarios and portal messages developed to reflect common medical situations, first manually, then with AI assistance. We find AI-assisted responses were longer, less readable, but provided acceptable drafts without edits 58% of time. AI assistance improved efficiency 77% of time, with low harm risk (82% safe). However, 7.7% unedited AI responses could severely harm. In 31% cases, physicians thought AI drafts were human-written. AI assistance led to more patient education recommendations, fewer clinical actions than manual responses. Results show promise for AI to improve clinician efficiency and patient care through assisting documentation, if used judiciously. Monitoring model outputs and human-AI interaction remains crucial for safe implementation.Comment: 4 figures and tables in main, submitted for revie

    Clinical outcomes of radiation therapy for transgender and gender-expansive people with cancer

    Get PDF
    IntroductionApproximately 1.6 million people in the US identify as transgender, many of whom undergo gender-affirming medical or surgical therapies. While transgender individuals are diagnosed with cancer at similar rates as those who are cisgender, the impacts of radiation therapy on outcomes of gender-affirming care in transgender, nonbinary, and gender-expansive people with cancer are understudied. We report on the experiences and outcomes of transgender and gender-expansive patients receiving radiation therapy for cancer treatment.MethodsThis study is a multi-institutional retrospective review of patients evaluated from 2005-2019 identified as transgender or gender-expansive in the medical record and treated with radiation therapy.ResultsWe identified 23 patients who received radiation to 32 sites, including 12 (38%) to the brain, head, or neck, 8 (25%) to the thorax, and 7 (22%) to the pelvis. Seventeen patients (74%) received gender-affirming hormone therapy and 13 patients (57%) underwent gender-affirming surgery. Four patients had pelvic radiation before or after gender-affirming pelvic surgery, including two trans women who had pelvic radiation after vaginoplasty. Four patients had radiation to the chest or thorax and gender-affirming chest or breast surgery, including two trans men with breast cancer. Two pediatric patients developed hypopituitarism and hypogonadism secondary to radiation therapy and, as adults, changed their hormone replacement therapy to affirm their transgender identities.DiscussionTransgender people with cancer undergo radiation therapy for a wide range of cancers. Understanding their prior gender-affirming medical or surgical treatments and future gender affirmation goals may identify important considerations for their oncologic care

    Measuring Pointwise V-Usable Information In-Context-ly

    No full text

    Seasonality of foliar respiration in two dominant plant species from the Arctic tundra: Response to long-term warming and short-term temperature variability

    No full text
    Direct measurements of foliar carbon exchange through the growing season in Arctic species are limited, despite the need for accurate estimates of photosynthesis and respiration to characterise carbon cycling in the tundra. We examined seasonal variatio

    Respiratory alternative oxidase responds to both low- and high-temperature stress in Quercus rubra leaves along an urban-rural gradient in New York

    No full text
    1.Urban-rural transects can be utilized as natural gradients of temperature and also as a tool to predict how plant ecology and physiology might respond to expected global change variables such as elevated temperatures, CO2 and inorganic nitrogen deposition. 2.We investigated differences in respiration (R) and the balance of electron partitioning through the cytochrome (CP) and alternative (AP) pathways in leaves of mature Quercus rubra L. trees along a transect from New York City to the Catskill Mountains over the course of one growing season. In addition, we investigated the effects of elevated temperature on Q. rubra seedlings in a controlled environment study. 3.In the field study, we found that urban-grown leaves often respired at greater rates than leaves grown at other sites and that this was likely due to higher leaf nitrogen. At each site, R at the prevailing growth temperature declined steadily throughout the growing season despite higher temperatures at the end of the summer. Differences in R were associated with changes in the relative abundances of cytochrome and alternative oxidase proteins. Oxygen isotope discrimination (D), which reflects relative changes in AP and CP partitioning, was negatively correlated with daily minimum temperature in trees grown at the colder rural sites, but not at the warmer urban sites. 4.In the growth cabinet study, we found that R acclimated to elevated temperatures and that this was accompanied by a steady increase in D. 5.These findings that AP partitioning increases with both high and low temperatures show that the AP may play an important role in plant responses to environmental conditions that elicit stress, and not simply to specific conditions such as low temperature

    Growth in eligibility criteria content and failure to accrue among National Cancer Institute (NCI)-affiliated clinical trials

    No full text
    BACKGROUND: Cancer trial accrual is a national priority, yet up to 20% of trials fail to accrue. Trial eligibility criteria growth may be associated with accrual failure. We sought to quantify eligibility criteria growth within National Cancer Institute (NCI)-affiliated trials and determine impact on accrual. METHODS: Utilizing the Aggregated Analysis of ClinicalTrials.gov, we analyzed phase II/III interventional NCI-affiliated trials initiated between 2008 and 2018. Eligibility criteria growth was assessed via number of unique content words within combined inclusion and exclusion criteria. Association between unique word count and accrual failure was evaluated with multivariable logistic regression, adjusting for known predictors of failure. Medical terms associated with accrual failure were identified via natural language processing and categorized. RESULTS: Of 1197 trials, 231 (19.3%) failed due to low accrual. Accrual failure rate increased with eligibility criteria growth, from 11.8% in the lowest decile (12-112 words) to 29.4% in the highest decile (445-750 words). Median eligibility criteria increased over time, from 214 (IQR [23, 282]) unique content words in 2008 to 417 (IQR [289, 514]) in 2018 (r2 = 0.73, P < 0.001). Eligibility criteria growth was independently associated with accrual failure (OR: 1.09 per decile, 95% CI [1.03-1.15], p = 0.004). Eighteen exclusion criteria categories were significantly associated with accrual failure, including renal, pulmonary, and diabetic, among others (Bonferroni-corrected p < 0.001). CONCLUSIONS: Eligibility criteria content growth is increasing dramatically among NCI-affiliated trials and is strongly associated with accrual failure. These findings support national initiatives to simplify eligibility criteria and suggest that further efforts are warranted to improve cancer trial accrual
    corecore