18 research outputs found

    Can LLMs Grade Short-answer Reading Comprehension Questions : Foundational Literacy Assessment in LMICs

    Full text link
    This paper presents emerging evidence of using generative large language models (i.e., GPT-4) to reliably evaluate short-answer reading comprehension questions. Specifically, we explore how various configurations of generative (LLMs) are able to evaluate student responses from a new dataset, drawn from a battery of reading assessments conducted with over 150 students in Ghana. As this dataset is novel and hence not used in training runs of GPT, it offers an opportunity to test for domain shift and evaluate the generalizability of generative LLMs, which are predominantly designed and trained on data from high-income North American countries. We found that GPT-4, with minimal prompt engineering performed extremely well on evaluating the novel dataset (Quadratic Weighted Kappa 0.923, F1 0.88), substantially outperforming transfer-learning based approaches, and even exceeding expert human raters (Quadratic Weighted Kappa 0.915, F1 0.87). To the best of our knowledge, our work is the first to empirically evaluate the performance of generative LLMs on short-answer reading comprehension questions, using real student data, and suggests that generative LLMs have the potential to reliably evaluate foundational literacy. Currently the assessment of formative literacy and numeracy is infrequent in many low and middle-income countries (LMICs) due to the cost and operational complexities of conducting them at scale. Automating the grading process for reading assessment could enable wider usage, and in turn improve decision-making regarding curricula, school management, and teaching practice at the classroom level. Importantly, in contrast transfer learning based approaches, generative LLMs generalize well and the technical barriers to their use are low, making them more feasible to implement and scale in lower resource educational contexts

    Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in Ghana

    Full text link
    This paper reports on a set of three recent experiments utilizing large-scale speech models to evaluate the oral reading fluency (ORF) of students in Ghana. While ORF is a well-established measure of foundational literacy, assessing it typically requires one-on-one sessions between a student and a trained evaluator, a process that is time-consuming and costly. Automating the evaluation of ORF could support better literacy instruction, particularly in education contexts where formative assessment is uncommon due to large class sizes and limited resources. To our knowledge, this research is among the first to examine the use of the most recent versions of large-scale speech models (Whisper V2 wav2vec2.0) for ORF assessment in the Global South. We find that Whisper V2 produces transcriptions of Ghanaian students reading aloud with a Word Error Rate of 13.5. This is close to the model's average WER on adult speech (12.8) and would have been considered state-of-the-art for children's speech transcription only a few years ago. We also find that when these transcriptions are used to produce fully automated ORF scores, they closely align with scores generated by expert human graders, with a correlation coefficient of 0.96. Importantly, these results were achieved on a representative dataset (i.e., students with regional accents, recordings taken in actual classrooms), using a free and publicly available speech model out of the box (i.e., no fine-tuning). This suggests that using large-scale speech models to assess ORF may be feasible to implement and scale in lower-resource, linguistically diverse educational contexts

    Characteristics Associated With Facebook Use and Interest in Digital Disease Support Among Older Adults With Atrial Fibrillation: Cross-Sectional Analysis of Baseline Data From the Systematic Assessment of Geriatric Elements in Atrial Fibrillation (SAGE-AF) Cohort

    Get PDF
    BACKGROUND: Online support groups for atrial fibrillation (AF) and apps to detect and manage AF exist, but the scientific literature does not describe which patients are interested in digital disease support. OBJECTIVE: The objective of this study was to describe characteristics associated with Facebook use and interest in digital disease support among older patients with AF who used the internet. METHODS: We used baseline data from the Systematic Assessment of Geriatric Elements in Atrial Fibrillation (SAGE-AF), a prospective cohort of older adults ( \u3e /=65 years) with AF at high stroke risk. Participants self-reported demographics, clinical characteristics, and Facebook and technology use. Online patients (internet use in the past 4 weeks) were asked whether they would be interested in participating in an online support AF community. Mobile users (owns smartphone and/or tablet) were asked about interest in communicating with their health care team about their AF-related health using a secure app. Logistic regression models identified crude and multivariable predictors of Facebook use and interest in digital disease support. RESULTS: Online patients (N=816) were aged 74.2 (SD 6.6) years, 47.8% (390/816) were female, and 91.1% (743/816) were non-Hispanic white. Roughly half (52.5%; 428/816) used Facebook. Facebook use was more common among women (adjusted odds ratio [aOR] 2.21, 95% CI 1.66-2.95) and patients with mild to severe depressive symptoms (aOR 1.50, 95% CI 1.08-2.10) and less common among patients aged \u3e /=85 years (aOR 0.27, 95% CI 0.15-0.48). Forty percent (40.4%; 330/816) reported interest in an online AF patient community. Interest in an online AF patient community was more common among online patients with some college/trade school or Bachelors/graduate school (aOR 1.70, 95% CI 1.10-2.61 and aOR 1.82, 95% CI 1.13-2.92, respectively), obesity (aOR 1.65, 95% CI 1.08-2.52), online health information seeking at most weekly or multiple times per week (aOR 1.84, 95% CI 1.32-2.56 and aOR 2.78, 95% CI 1.86-4.16, respectively), and daily Facebook use (aOR 1.76, 95% CI 1.26-2.46). Among mobile users, 51.8% (324/626) reported interest in communicating with their health care team via a mobile app. Interest in app-mediated communication was less likely among women (aOR 0.48, 95% CI 0.34-0.68) and more common among online patients who had completed trade school/some college versus high school/General Educational Development (aOR 1.95, 95% CI 1.17-3.22), sought online health information at most weekly or multiple times per week (aOR 1.86, 95% CI 1.27-2.74 and aOR 2.24, 95% CI 1.39-3.62, respectively), and had health-related apps (aOR 3.92, 95% CI 2.62-5.86). CONCLUSIONS: Among older adults with AF who use the internet, technology use and demographics are associated with interest in digital disease support. Clinics and health care providers may wish to encourage patients to join an existing online support community for AF and explore opportunities for app-mediated patient-provider communication

    Effect of angiotensin-converting enzyme inhibitor and angiotensin receptor blocker initiation on organ support-free days in patients hospitalized with COVID-19

    Get PDF
    IMPORTANCE Overactivation of the renin-angiotensin system (RAS) may contribute to poor clinical outcomes in patients with COVID-19. Objective To determine whether angiotensin-converting enzyme (ACE) inhibitor or angiotensin receptor blocker (ARB) initiation improves outcomes in patients hospitalized for COVID-19. DESIGN, SETTING, AND PARTICIPANTS In an ongoing, adaptive platform randomized clinical trial, 721 critically ill and 58 non–critically ill hospitalized adults were randomized to receive an RAS inhibitor or control between March 16, 2021, and February 25, 2022, at 69 sites in 7 countries (final follow-up on June 1, 2022). INTERVENTIONS Patients were randomized to receive open-label initiation of an ACE inhibitor (n = 257), ARB (n = 248), ARB in combination with DMX-200 (a chemokine receptor-2 inhibitor; n = 10), or no RAS inhibitor (control; n = 264) for up to 10 days. MAIN OUTCOMES AND MEASURES The primary outcome was organ support–free days, a composite of hospital survival and days alive without cardiovascular or respiratory organ support through 21 days. The primary analysis was a bayesian cumulative logistic model. Odds ratios (ORs) greater than 1 represent improved outcomes. RESULTS On February 25, 2022, enrollment was discontinued due to safety concerns. Among 679 critically ill patients with available primary outcome data, the median age was 56 years and 239 participants (35.2%) were women. Median (IQR) organ support–free days among critically ill patients was 10 (–1 to 16) in the ACE inhibitor group (n = 231), 8 (–1 to 17) in the ARB group (n = 217), and 12 (0 to 17) in the control group (n = 231) (median adjusted odds ratios of 0.77 [95% bayesian credible interval, 0.58-1.06] for improvement for ACE inhibitor and 0.76 [95% credible interval, 0.56-1.05] for ARB compared with control). The posterior probabilities that ACE inhibitors and ARBs worsened organ support–free days compared with control were 94.9% and 95.4%, respectively. Hospital survival occurred in 166 of 231 critically ill participants (71.9%) in the ACE inhibitor group, 152 of 217 (70.0%) in the ARB group, and 182 of 231 (78.8%) in the control group (posterior probabilities that ACE inhibitor and ARB worsened hospital survival compared with control were 95.3% and 98.1%, respectively). CONCLUSIONS AND RELEVANCE In this trial, among critically ill adults with COVID-19, initiation of an ACE inhibitor or ARB did not improve, and likely worsened, clinical outcomes. TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT0273570

    How AI works: reconfiguring lifelong learning

    No full text

    EdTech for Ugandan girls: Affordances of different technologies for girls' secondary education during the Covid-19 pandemic.

    Get PDF
    MOTIVATION: This article discusses the use of educational technology (EdTech) in girls' education at PEAS (Promoting Education in African Schools) schools in rural Uganda during the Covid-19-related school closures. PURPOSE: This article addresses a research gap surrounding the potential use of EdTech to support girls' education, focusing on the barriers to girls' EdTech use and how technology might be used to enhance girls' education in disadvantaged rural areas-specifically their academic learning and their social and emotional learning. METHODS AND APPROACH: A sequential, explanatory mixed-methods case-study approach was used. Quantitative exploration of a dataset of 483 Ugandan students, from 28 PEAS schools, was first conducted, followed by interviews with PEAS staff to elucidate the reasons and context behind the findings. FINDINGS: Findings show that female students are less likely than male students to have access to their caregivers' phones for learning. The form of EdTech that appeared to be most beneficial for girls' academic learning was radio; girls also had significantly more interest in tuning into radio broadcasts than boys did. Also, poorer boys were more likely to be influenced by SMS messages than wealthier boys. Apart from gender-based differences, students with more highly educated parents found SMS messages more helpful, and phone calls from teachers appeared to help boost younger students' self-confidence. POLICY IMPLICATIONS: The findings suggest that policy-makers need to: carefully consider provision of education through multiple modes of EdTech in order to ensure that it reaches all students; ensure that caregivers are involved in the strategies developed for girls' education; make EdTech interventions interactive; and consider language in EdTech interventions. Given the gender differences which emerged, the findings are of relevance both to supporting the continuation of educational provision during periods of school closure, and also in terms of finding additional ways to support girls' education alongside formal schooling
    corecore