22 research outputs found

    Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

    Full text link
    Recent advances in large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates the performance of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical tasks beyond question-answering. Because no patient data can be passed to the OpenAI API public interface, we evaluated model performance with over 10000 samples as proxies for two fundamental tasks in the clinical domain - classification and reasoning. The first task is classifying whether statements of clinical and policy recommendations in scientific literature constitute health advice. The second task is causal relation detection from the biomedical literature. We compared LLMs with simpler models, such as bag-of-words (BoW) with logistic regression, and fine-tuned BioBERT models. Despite the excitement around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks remained the best strategy. The simple BoW model performed on par with the most complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie

    Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy

    Full text link
    Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet remain under-studied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for three esophagitis classification tasks: Task 1) presence of esophagitis, Task 2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by over 2% for all tasks. Silver-labeled data improved the macro-F1 by over 3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning. To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinic notes. The promising performance provides proof-of-concept for NLP-based automated detailed toxicity monitoring in expanded domains.Comment: 17 pages, 6 tables, 1figure, submiting to JCO-CCI for revie

    Large Language Models to Identify Social Determinants of Health in Electronic Health Records

    Full text link
    Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated. The study also experimented with synthetic data generation and assessed for algorithmic bias. Our best-performing models were fine-tuned Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The benefit of augmenting fine-tuning with synthetic data varied across model architecture and size, with smaller Flan-T5 models (base and large) showing the greatest improvements in performance (delta F1 +0.12 to +0.23). Model performance was similar on the in-hospital system dataset but worse on the MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models for both tasks. These fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can effectively extracted SDoH information from clinic notes, performing better compare to GPT zero- and few-shot settings. These models could enhance real-world evidence on SDoH and aid in identifying patients needing social support.Comment: 38 pages, 5 figures, 5 tables in main, submitted for revie

    The impact of responding to patient messages with large language model assistance

    Full text link
    Documentation burden is a major contributor to clinician burnout, which is rising nationally and is an urgent threat to our ability to care for patients. Artificial intelligence (AI) chatbots, such as ChatGPT, could reduce clinician burden by assisting with documentation. Although many hospitals are actively integrating such systems into electronic medical record systems, AI chatbots utility and impact on clinical decision-making have not been studied for this intended use. We are the first to examine the utility of large language models in assisting clinicians draft responses to patient questions. In our two-stage cross-sectional study, 6 oncologists responded to 100 realistic synthetic cancer patient scenarios and portal messages developed to reflect common medical situations, first manually, then with AI assistance. We find AI-assisted responses were longer, less readable, but provided acceptable drafts without edits 58% of time. AI assistance improved efficiency 77% of time, with low harm risk (82% safe). However, 7.7% unedited AI responses could severely harm. In 31% cases, physicians thought AI drafts were human-written. AI assistance led to more patient education recommendations, fewer clinical actions than manual responses. Results show promise for AI to improve clinician efficiency and patient care through assisting documentation, if used judiciously. Monitoring model outputs and human-AI interaction remains crucial for safe implementation.Comment: 4 figures and tables in main, submitted for revie

    Clinical outcomes of radiation therapy for transgender and gender-expansive people with cancer

    Get PDF
    IntroductionApproximately 1.6 million people in the US identify as transgender, many of whom undergo gender-affirming medical or surgical therapies. While transgender individuals are diagnosed with cancer at similar rates as those who are cisgender, the impacts of radiation therapy on outcomes of gender-affirming care in transgender, nonbinary, and gender-expansive people with cancer are understudied. We report on the experiences and outcomes of transgender and gender-expansive patients receiving radiation therapy for cancer treatment.MethodsThis study is a multi-institutional retrospective review of patients evaluated from 2005-2019 identified as transgender or gender-expansive in the medical record and treated with radiation therapy.ResultsWe identified 23 patients who received radiation to 32 sites, including 12 (38%) to the brain, head, or neck, 8 (25%) to the thorax, and 7 (22%) to the pelvis. Seventeen patients (74%) received gender-affirming hormone therapy and 13 patients (57%) underwent gender-affirming surgery. Four patients had pelvic radiation before or after gender-affirming pelvic surgery, including two trans women who had pelvic radiation after vaginoplasty. Four patients had radiation to the chest or thorax and gender-affirming chest or breast surgery, including two trans men with breast cancer. Two pediatric patients developed hypopituitarism and hypogonadism secondary to radiation therapy and, as adults, changed their hormone replacement therapy to affirm their transgender identities.DiscussionTransgender people with cancer undergo radiation therapy for a wide range of cancers. Understanding their prior gender-affirming medical or surgical treatments and future gender affirmation goals may identify important considerations for their oncologic care

    Seasonality of foliar respiration in two dominant plant species from the Arctic tundra: Response to long-term warming and short-term temperature variability

    No full text
    Direct measurements of foliar carbon exchange through the growing season in Arctic species are limited, despite the need for accurate estimates of photosynthesis and respiration to characterise carbon cycling in the tundra. We examined seasonal variatio

    Respiratory alternative oxidase responds to both low- and high-temperature stress in Quercus rubra leaves along an urban-rural gradient in New York

    No full text
    1.Urban-rural transects can be utilized as natural gradients of temperature and also as a tool to predict how plant ecology and physiology might respond to expected global change variables such as elevated temperatures, CO2 and inorganic nitrogen deposition. 2.We investigated differences in respiration (R) and the balance of electron partitioning through the cytochrome (CP) and alternative (AP) pathways in leaves of mature Quercus rubra L. trees along a transect from New York City to the Catskill Mountains over the course of one growing season. In addition, we investigated the effects of elevated temperature on Q. rubra seedlings in a controlled environment study. 3.In the field study, we found that urban-grown leaves often respired at greater rates than leaves grown at other sites and that this was likely due to higher leaf nitrogen. At each site, R at the prevailing growth temperature declined steadily throughout the growing season despite higher temperatures at the end of the summer. Differences in R were associated with changes in the relative abundances of cytochrome and alternative oxidase proteins. Oxygen isotope discrimination (D), which reflects relative changes in AP and CP partitioning, was negatively correlated with daily minimum temperature in trees grown at the colder rural sites, but not at the warmer urban sites. 4.In the growth cabinet study, we found that R acclimated to elevated temperatures and that this was accompanied by a steady increase in D. 5.These findings that AP partitioning increases with both high and low temperatures show that the AP may play an important role in plant responses to environmental conditions that elicit stress, and not simply to specific conditions such as low temperature

    Growth in eligibility criteria content and failure to accrue among National Cancer Institute (NCI)-affiliated clinical trials

    No full text
    BACKGROUND: Cancer trial accrual is a national priority, yet up to 20% of trials fail to accrue. Trial eligibility criteria growth may be associated with accrual failure. We sought to quantify eligibility criteria growth within National Cancer Institute (NCI)-affiliated trials and determine impact on accrual. METHODS: Utilizing the Aggregated Analysis of ClinicalTrials.gov, we analyzed phase II/III interventional NCI-affiliated trials initiated between 2008 and 2018. Eligibility criteria growth was assessed via number of unique content words within combined inclusion and exclusion criteria. Association between unique word count and accrual failure was evaluated with multivariable logistic regression, adjusting for known predictors of failure. Medical terms associated with accrual failure were identified via natural language processing and categorized. RESULTS: Of 1197 trials, 231 (19.3%) failed due to low accrual. Accrual failure rate increased with eligibility criteria growth, from 11.8% in the lowest decile (12-112 words) to 29.4% in the highest decile (445-750 words). Median eligibility criteria increased over time, from 214 (IQR [23, 282]) unique content words in 2008 to 417 (IQR [289, 514]) in 2018 (r2 = 0.73, P < 0.001). Eligibility criteria growth was independently associated with accrual failure (OR: 1.09 per decile, 95% CI [1.03-1.15], p = 0.004). Eighteen exclusion criteria categories were significantly associated with accrual failure, including renal, pulmonary, and diabetic, among others (Bonferroni-corrected p < 0.001). CONCLUSIONS: Eligibility criteria content growth is increasing dramatically among NCI-affiliated trials and is strongly associated with accrual failure. These findings support national initiatives to simplify eligibility criteria and suggest that further efforts are warranted to improve cancer trial accrual

    Predictors of complete response and disease recurrence following chemoradiation for rectal cancer

    Get PDF
    Objective: Approximately 10-40% of rectal patients have a complete response (CR) to neoadjuvant chemoradiation (CRT), and these patients have improved survival. Thus, non-operative management (watch-and-wait approach) may be an option for select patients. We aimed to identify clinical predictors of complete response following CRT.Methods: Patients treated with definitive chemoradiation for T3-T4, locally unresectable T1-T2, low-lying T2, and/or node-positive rectal cancer from August 2004 to February 2015 were retrospectively reviewed. Most patients were treated with 50.4 Gy radiation and concurrent 5-fluoruracil or capecitabine. Patients were considered to have a CR if surgical pathology revealed ypT0N0M0 (operative management), or if they had no evidence of residual disease on clinical and radiographic assessment (non-operative management). Statistical analysis was carried out to determine predictors of CR and long-term outcomes. Results: Complete records were available on 138 patients. The median follow-up was 24.5 mos. 36 patients (26.3%) achieved a CR; 30/123 operatively managed patients (24.5%) and 6/15 (40%) non-operatively managed patients. None of the 10 patients with mucinous adenocarcinoma achieved a CR. CEA ≥5 μg/L at diagnosis (OR 0.190, 95% CI 0.037-0.971, p=0.046), tumor size ≥3 cm (OR 0.123, 95% CI 0.020-0.745, p=0.023), distance of tumor from the anal verge ≥3 cm (OR 0.091, 95% CI 0.013-0.613, p=0.014), clinically node positive disease at diagnosis (OR 0.201, 95% CI 0.045-0.895, p=0.035), and interval from CRT to surgery ≥8 weeks (OR 5.267, 95% CI 1.068-25.961, p=0.041) were independent predictors of CR. The CR group had longer 3-year distant metastasis-free survival (DMFS) (93.7% vs. 63.7%, p=0.016) and 3-year disease-free survival (DFS) (91.1% vs. 67.8%, p=0.038). 3-year locoregional control (LRC) (96.6% vs. 81.3%, p=0.103) and overall survival (OS) (97.2% vs. 87.5%, p=0.125) were higher in the CR group but this did not achieve statistical significance. CR was not an independent predictor of LRC, DMFS, or DFS.Conclusions: CEA at diagnosis, tumor size, tumor distance from the anal verge, node positivity at diagnosis, and interval from CRT to surgery were predictors of CR. These clinical variables may offer insight into patient selection and timing of treatment response evaluation in the watch-and-wait approach

    Prostate-specific antigen nadir and testosterone level at prostate-specific antigen failure following radiation and androgen suppression therapy for unfavorable-risk prostate cancer and the risk of all-cause and prostate cancer–specific mortality

    No full text
    BACKGROUND: Although both PSA nadir (PSAn) and testosterone levels at PSA failure are known prognostic factors in men undergoing radiation therapy (RT) and androgen deprivation therapy (ADT) for unfavorable-risk prostate cancer (PC), it is unclear whether their prognostic significance is independent or overlapping. METHODS: Seventy-five men treated with RT with or without 6 months of ADT for unfavorable-risk nonmetastatic PC enrolled in 2 prospective clinical trials between 1986 and 2001 formed the study cohort. Competing risks and Cox multivariable regression were used to assess whether low versus normal serum testosterone at the time of PSA failure and higher PSAn after initial therapy were independently associated with the risk of PC-specific (PCSM) and all-cause mortality (ACM) adjusting for PC prognostic factors. RESULTS: After a median follow-up of 15.34 years (interquartile range, 6.66-16.88 years), there were 53 deaths (73.3%): 30 (56.6%) were from PC. Low testosterone at PSA failure was significantly associated with an increased risk of PCSM (adjusted HR [AHR], 7.77; 95% CI, 1.14-52.99; P =.04) and ACM (AHR, 3.01; 95% CI, 1.01-8.96; P =.05), as was higher PSAn (PCSM AHR, 1.03; 95% CI, 1.01-1.05; P \u3c.01; ACM AHR, 1.04; 95% CI, 1.02-1.07; P \u3c.01), although the prognostic significance of PSAn was only noted in men with a normal testosterone at PSA failure. CONCLUSIONS: Low testosterone level at PSA failure in high-risk patients with PC treated with RT is associated with increased PCSM and ACM risk. In men with normal testosterone levels at the time of PSA failure, an elevated PSAn was associated with worse PCSM and ACM risk. LAY SUMMARY: This study investigates whether the prostate-specific antigen (PSA) nadir and normal versus low testosterone at the time of PSA failure provide mutually exclusive or overlapping prognostic information following treatment with radiation and androgen deprivation therapy for unfavorable-risk patients with prostate cancer using data from 2 prospective clinical trials. It was found that both provided prognostic information; however, higher PSA nadir was only found to be of prognostic significance in men with normal testosterone levels at PSA failure
    corecore