35 research outputs found

    Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

    Full text link
    Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM. We build on a prior study of GPT-4's capabilities on medical challenge benchmarks in the absence of special training. Rather than using simple prompting to highlight the model's out-of-the-box capabilities, we perform a systematic exploration of prompt engineering. We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks. The prompting methods we explore are general purpose, and make no specific use of domain expertise, removing the need for expert-curated content. Our experimental design carefully controls for overfitting during the prompt engineering process. We introduce Medprompt, based on a composition of several prompting strategies. With Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark datasets in the MultiMedQA suite. The method outperforms leading specialist models such as Med-PaLM 2 by a significant margin with an order of magnitude fewer calls to the model. Steering GPT-4 with Medprompt achieves a 27% reduction in error rate on the MedQA dataset over the best methods to date achieved with specialist models and surpasses a score of 90% for the first time. Beyond medical problems, we show the power of Medprompt to generalize to other domains and provide evidence for the broad applicability of the approach via studies of the strategy on exams in electrical engineering, machine learning, philosophy, accounting, law, nursing, and clinical psychology.Comment: 21 pages, 7 figure

    Not all green space is created equal: biodiversity predicts psychological restorative benefits from urban green space

    Get PDF
    Contemporary epidemiological methods testing the associations between green space and psychological well-being treat all vegetation cover as equal. However, there is very good reason to expect that variations in ecological "quality" (number of species, integrity of ecological processes) may influence the link between access to green space and benefits to human health and well-being. We test the relationship between green space quality and restorative benefit in an inner city urban population in Bradford, UK. We selected 12 urban parks for study where we carried out botanical and faunal surveys to quantify biodiversity and assessed the site facilities of the green space (cleanliness, provision of amenities). We also conducted 128 surveys with park users to quantify psychological restoration based on four self-reported measure of general restoration, attention-grabbing distractions, being away from everyday life, and site preference. We present three key results. First, there is a positive association between site facilities and biodiversity. Second, restorative benefit is predicted by biodiversity, which explained 43% of the variance in restorative benefit across the parks, with minimal input from other variables. Third, the benefits accrued through access to green space were unrelated to age, gender, and ethnic background. The results add to a small but growing body of evidence that emphasise the role of nature in contributing to the well-being of urban populations and, hence, the need to consider biodiversity in the design of landscapes that enhance multiple ecosystem services

    Robust and Efficient Medical Imaging with Self-Supervision

    Full text link
    Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific data [1]. However, this quickly becomes impractical as medical data is time-consuming to acquire and expensive to annotate [2]. Thus, the problem of "data-efficient generalization" presents an ongoing difficulty for Medical AI development. Although progress in representation learning shows promise, their benefits have not been rigorously studied, specifically for out-of-distribution settings. To meet these challenges, we present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. REMEDIS uses a generic combination of large-scale supervised transfer learning with self-supervised learning and requires little task-specific customization. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data. REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strong supervised baseline. More importantly, our strategy leads to strong data-efficient generalization of medical imaging AI, matching strong supervised baselines using between 1% to 33% of retraining data across tasks. These results suggest that REMEDIS can significantly accelerate the life-cycle of medical imaging AI development thereby presenting an important step forward for medical imaging AI to deliver broad impact

    A Tutorial for Using Twitter Data in the Social Sciences: Data Collection, Preparation, and Analysis

    No full text
    corecore