Search CORE

65 research outputs found

Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models

Author: Goodman Noah D.
Haber Nick
Huang Qian
Poesia Gabriel
Zelikman Eric
Publication venue
Publication date: 26/01/2023
Field of study

Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, taking hierarchical function descriptions in natural language as input. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis, robotic planning, and theorem proving. We show that LLMs generating Parsel solve more competition-level problems in the APPS dataset, resulting in pass rates that are over 75% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. We also find that LLM-generated robotic plans using Parsel as an intermediate language are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers.Comment: new quantitative detail

arXiv.org e-Print Archive

Hypothesis Search: Inductive Reasoning with Language Models

Author: Goodman Noah D.
Haber Nick
Poesia Gabriel
Pu Yewen
Wang Ruocheng
Zelikman Eric
Publication venue
Publication date: 11/09/2023
Field of study

Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be directly verified by running on the observed examples and generalized to novel inputs. Because of the prohibitive cost of generation with state-of-the-art LLMs, we consider a middle step to filter the set of hypotheses that will be implemented into programs: we either ask the LLM to summarize into a smaller set of hypotheses, or ask human annotators to select a subset of the hypotheses. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, and string transformation dataset SyGuS. On a random 40-problem subset of ARC, our automated pipeline using LLM summaries achieves 27.5% accuracy, significantly outperforming the direct prompting baseline (accuracy of 12.5%). With the minimal human input of selecting from LLM-generated candidates, the performance is boosted to 37.5%. (And we argue this is a lower bound on the performance of our approach without filtering.) Our ablation studies show that abstract hypothesis generation and concrete program representations are both beneficial for LLMs to perform inductive reasoning tasks

arXiv.org e-Print Archive

There is a large disparity between what people see in social media about health research and the underlying strength of evidence

Author: Breskin Alexander
Haber Noah
Moscoe Ellen
Smith Emily R.
Publication venue: London School of Economics and Political Science
Publication date: 02/07/2018
Field of study

Our social media feeds are full of articles shared by friends and family that make claims about how something can prevent a particular health condition. But how robust is the scientific evidence base underpinning these claims? Noah Haber, Alexander Breskin, Ellen Moscoe and Emily R. Smith, on behalf of the CLAIMS team, report on a systematic review of the state of causal inference in media articles and academic studies at the point of consumption on social media. There is a large disparity between what people see in social media about health research compared with the underlying strength of evidence, both in the studies themselves and in the media articles describing their findings. The studies tend to imply stronger causal inference than their methods merit, while media articles reporting on them were found to be further overstated and inaccurate

LSE Research Online

List randomization for eliciting HIV status and sexual behaviors in rural KwaZulu-Natal, South Africa: a randomized experiment using known true values for validation

Author: Bärnighausen Till
Cohen Jessica
Fink Günther
Gareta Dickman
Haber Noah
Harling Guy
Herbst Kobus
Mutevedzi Tinofa
Pillay Deenan
Tanser Frank
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background: List randomization (LR), a survey method intended to mitigate biases related to sensitive true/false questions, has received recent attention from researchers. However, tests of its validity are limited, with no study comparing LR-elicited results with individually known truths. We conducted a test of LR for HIV-related responses in a high HIV prevalence setting in KwaZulu-Natal. By using researcher-known HIV serostatus and HIV test refusal data, we were able to assess how LR and direct questionnaires perform against individual known truth. Methods: Participants were recruited from the participation list from the 2016 round of the Africa Health Research Institute demographic surveillance system, oversampling individuals who were HIV positive. Participants were randomized to two study arms. In Arm A, participants were presented five true/false statements, one of which was the sensitive item, the others non-sensitive. Participants were then asked how many of the five statements they believed were true. In Arm B, participants were asked about each statement individually. LR estimates used data from both arms, while direct estimates were generated from Arm B alone. We compared elicited responses to HIV testing and serostatus data collected through the demographic surveillance system. Results: We enrolled 483 participants, 262 (54%) were randomly assigned to Arm A, and 221 (46%) to Arm B. LR estimated 56% (95% CI: 40 to 72%) of the population to be HIV-negative, compared to 47% (95% CI: 39 to 54%) using direct estimates; the population-estimate of the true value was 32% (95% CI: 28 to 36%). LR estimates yielded HIV test refusal percentages of 55% (95% CI: 37 to 73%) compared to 13% (95% CI: 8 to 17%) by direct estimation, and 15% (95% CI: 12 to 18%) based on observed past behavior. Conclusions: In this context, LR performed poorly when compared to known truth, and did not improve estimates over direct questioning methods when comparing with known truth. These results may reflect difficulties in implementation or comprehension of the LR approach, which is inherently complex. Adjustments to delivery procedures may improve LR’s usefulness. Further investigation of the cognitive processes of participants in answering LR surveys is warranted

University of Lincoln Institutional Repository

Crossref

Harvard University - DASH

Heidelberger Dokumentenserver

edoc

Directory of Open Access Journals

UCL Discovery

Carolina Digital Repository

List randomization for eliciting HIV status and sexual behaviors in rural KwaZulu-Natal, South Africa: a randomized experiment using known true values for validation

Author: Bärnighausen Till
Cohen Jessica
Fink Günther
Gareta Dickman
Haber Noah
Harling Guy
Herbst Kobus
Mutevedzi Tinofa
Pillay Deenan
Tanser Frank
Publication venue: BioMed Central
Publication date: 25/05/2018
Field of study

Abstract Background List randomization (LR), a survey method intended to mitigate biases related to sensitive true/false questions, has received recent attention from researchers. However, tests of its validity are limited, with no study comparing LR-elicited results with individually known truths. We conducted a test of LR for HIV-related responses in a high HIV prevalence setting in KwaZulu-Natal. By using researcher-known HIV serostatus and HIV test refusal data, we were able to assess how LR and direct questionnaires perform against individual known truth. Methods Participants were recruited from the participation list from the 2016 round of the Africa Health Research Institute demographic surveillance system, oversampling individuals who were HIV positive. Participants were randomized to two study arms. In Arm A, participants were presented five true/false statements, one of which was the sensitive item, the others non-sensitive. Participants were then asked how many of the five statements they believed were true. In Arm B, participants were asked about each statement individually. LR estimates used data from both arms, while direct estimates were generated from Arm B alone. We compared elicited responses to HIV testing and serostatus data collected through the demographic surveillance system. Results We enrolled 483 participants, 262 (54%) were randomly assigned to Arm A, and 221 (46%) to Arm B. LR estimated 56% (95% CI: 40 to 72%) of the population to be HIV-negative, compared to 47% (95% CI: 39 to 54%) using direct estimates; the population-estimate of the true value was 32% (95% CI: 28 to 36%). LR estimates yielded HIV test refusal percentages of 55% (95% CI: 37 to 73%) compared to 13% (95% CI: 8 to 17%) by direct estimation, and 15% (95% CI: 12 to 18%) based on observed past behavior. Conclusions In this context, LR performed poorly when compared to known truth, and did not improve estimates over direct questioning methods when comparing with known truth. These results may reflect difficulties in implementation or comprehension of the LR approach, which is inherently complex. Adjustments to delivery procedures may improve LR’s usefulness. Further investigation of the cognitive processes of participants in answering LR surveys is warranted

Carolina Digital Repository

The worldwide clinical trial research response to the COVID-19 pandemic - the first 100 days

Background: Never before have clinical trials drawn as much public attention as those testing interventions for COVID-19. We aimed to describe the worldwide COVID-19 clinical research response and its evolution over the first 100 days of the pandemic. Methods: Descriptive analysis of planned, ongoing or completed trials by April 9, 2020 testing any intervention to treat or prevent COVID-19, systematically identified in trial registries, preprint servers, and literature databases. A survey was conducted of all trials to assess their recruitment status up to July 6, 2020. Results: Most of the 689 trials (overall target sample size 396,366) were small (median sample size 120; interquartile range [IQR] 60-300) but randomized (75.8%; n=522) and were often conducted in China (51.1%; n=352) or the USA (11%; n=76). 525 trials (76.2%) planned to include 155,571 hospitalized patients, and 25 (3.6%) planned to include 96,821 health-care workers. Treatments were evaluated in 607 trials (88.1%), frequently antivirals (n=144) or antimalarials (n=112); 78 trials (11.3%) focused on prevention, including 14 vaccine trials. No trial investigated social distancing. Interventions tested in 11 trials with >5,000 participants were also tested in 169 smaller trials (median sample size 273; IQR 90-700). Hydroxychloroquine alone was investigated in 110 trials. While 414 trials (60.0%) expected completion in 2020, only 35 trials (4.1%; 3,071 participants) were completed by July 6. Of 112 trials with detailed recruitment information, 55 had recruited <20% of the targeted sample; 27 between 20-50%; and 30 over 50% (median 14.8% [IQR 2.0-62.0%]). Conclusions: The size and speed of the COVID-19 clinical trials agenda is unprecedented. However, most trials were small investigating a small fraction of treatment options. The feasibility of this research agenda is questionable, and many trials may end in futility, wasting research resources. Much better coordination is needed to respond to global health threats

University of Liverpool Repository

HAL-Inserm

edoc

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

HAL-Rennes 1

Association between convalescent plasma treatment and mortality in COVID-19: a collaborative systematic review and meta-analysis of randomized clinical trials.

Author: Aalborg University Hospital
Abayomi Akin
Abduljalil Manal
Abdulrahman Abdulkarim
Acosta-Ampudia Yeny
Aguilar-Guisado Manuela
Al-Beidh Farah
Alejandria Marissa M.
Alfonso Rachelle N.
Ali Mohammad
AlQahtani Manaf
AlZamrooni Alaa
Anaya Juan-Manuel
Ang Mark Angelo C.
Aomar Ismael F.
Argumanis Luis E.
Averyanov Alexander
Axfors Cathrine
Baklaushev Vladimir P.
Balionis Olga
Benfield Thomas
Berry Scott
Birocco Nadia
Bonifacio Lynn B.
Bowen Asha C.
Bown Abbie
Cabello-Gutierrez Carlos
Camacho Bernardo
Camacho-Ortiz Adrian
Campbell-Lee Sally
Cao Damon H.
Cardesa Ana
Carnate Jose M.
Castillo German Jr. J.
Cavallo Rossana
Chowdhury Fazle R.
Chowdhury Forhad U. H.
Ciccone Giovannino
Cingolani Antonella
Climacosa Fresthel Monica M.
Compernolle Veerle
Cortez Carlo Francisco N.
Costa Neto Abel
D?Antico Sergio
D?az Ponce-Medrano Juan A.
D?az-Coronado Juan C.
Daly James
Danielle Franca
Davis Joshua S.
De Rosa Francesco Giuseppe
Denholm Justin T.
Denkinger Claudia M.
Department of Clinical Medicine
Department of Infectious Diseases
Desmecht Daniel
Donneau Anne-Fran?oise
Dumagay Teresita E.
Dunachie Susanna
Dungog Cecile C.
Erinoso Olufemi
Escasa Ivy Mae S.
Estcourt Lise J.
Evans Amy
Evasan Agnes L. M.
Fareli Christian J.
Fernandez-Sanchez Veronica
G?mez-Almaguer David
Galassi Claudia
Gallo Juan E.
Garcia Jesus A.
Garcia Patricia J.
Garcia Patricia L.
Garigliany Mutien
Garza-Gonzalez Elvira
Gauiran Deonne Thaddeus V.
Gaviria Garc?a Paula A.
Giron-Gonzalez Jose-Antonio
Goodman Steven N.
Gordon Anthony C.
Gothot Andr?
Grass Guaqueta Jeser Santiago
Green Cameron
Grimaldi David
Haber Noah A.
Hammond Naomi E.
Harvala Heli
Hemkens Lars G.
Heralde Francisco M.
Herrick Jesica
Higgins Alisa M.
Hills Thomas E.
Hines Jennifer
Holm Karin
Hoque Ashraful
Hospital Universitario de Jerez de la Frontera
Hoste Eric
Ignacio Jose M.
Ioannidis John P. A.
Ivanov Alexander V.
Janiaud Perrine
Janssen Maike
Jennings Jeffrey H.
Jha Vivekanand
Khanna Nina
King Ruby Anne N.
Kjeldsen-Kragh Jens
Klenerman Paul
Klinik Medicin og Akut
Kotecha Aditya
Krapp Fiorella
L?pez-C?rdenas Salvador
L?pez-Robles Concepci?n
Labanca Luciana
Laing Emma
Landin-Olsson Mona
Laterre Pierre-Fran?ois
Lim Jodor
Lim Lyn-Li
Ljungquist Oskar
Llaca-D?az Jorge M.
Lopez-Plaza Ileana
Lucero Josephine Anne C.
Lundgren Maria
M?ller-Tidow Carsten
Mac?as Juan
Maganito Sandy C.
Malundo Anna Flor G.
Manrique Rub?n D.
Manzini Paola M.
Marcos Miguel
Marquez Ignacio
Mart?nez-Marcos Francisco Javier
Mata Ana M.
McArthur Colin J.
McQuilten Zoe K.
McVerry Bryan J.
Menon David K.
Meyfroidt Geert
Mirasol Ma. Angelina L.
Misset Beno?t
Moher David
Molton James S.
Mondragon Alric V.
Monsalve Diana M.
Moradi Choghakabodi Parastoo
Morpeth Susan C.
Mouncey Paul R.
Moutschen Michel
Murphy Erin
Najdovski Tome
Nichol Alistair D.
Nielsen Henrik
Novak Richard M.
O?Sullivan Matthew V. N.
Olalla Julian
Osibogun Akin
Osikomaiya Bodunrin
Oyonarte Salvador
P?rez-Alba Eduardo
Pardo-Oviedo Juan M.
Patel Mahesh C.
Paterson David L.
Pe?a-Perez Carlos A.
Perez-Calatayud Angel A.
Perkina Anastasia
Perry Naomi
Pouladzadeh Mandana
Poyato Inmaculada
Price David J.
Quero Anne Kristine H.
Rahman Md. M.
Rahman Md. S.
Ram?rez-Santana Carolina
Ramesh Mayur
Rasmussen Magnus
Rees Megan A.
Rego Eduardo
Roberts David J.
Roberts Jason A.
Rodr?guez Yhojan
Rodr?guez-Ba?o Jes?s
Rogers Benjamin A.
Rojas Manuel
Romero Alberto
Rowan Kathryn M.
Saccona Fabio
Safdarian Mehdi
Santos Maria Clariza M.
Sasadeusz Joe
Schmitt Andreas M.
Scozzari Gitana
Shankar-Hari Manu
Sharma Gorav
Smith Emily R.
Snelling Thomas
Soto Alonso
Tagayuna Pedrito Y.
Tang Amy
Tatem Geneva
Teofili Luciana
The Faculty of Medicine
Tong Steven Y. C.
Turgeon Alexis F.
van?t Hooft Janneke
Veloso Januario D.
Venkatesh Balasubramanian
Ventura-Enriquez Yanet
Webb Steve A.
Wiese Lothar
Wik?n Christian
Wood Erica M.
Yusubalieva Gaukhar M.
Zacharowski Kai
Zarychanski Ryan
Publication venue: BMC Infect Dis
Publication date: 01/01/2021
Field of study

Funder: laura and john arnold foundationBACKGROUND: Convalescent plasma has been widely used to treat COVID-19 and is under investigation in numerous randomized clinical trials, but results are publicly available only for a small number of trials. The objective of this study was to assess the benefits of convalescent plasma treatment compared to placebo or no treatment and all-cause mortality in patients with COVID-19, using data from all available randomized clinical trials, including unpublished and ongoing trials (Open Science Framework, https://doi.org/10.17605/OSF.IO/GEHFX ). METHODS: In this collaborative systematic review and meta-analysis, clinical trial registries (ClinicalTrials.gov, WHO International Clinical Trials Registry Platform), the Cochrane COVID-19 register, the LOVE database, and PubMed were searched until April 8, 2021. Investigators of trials registered by March 1, 2021, without published results were contacted via email. Eligible were ongoing, discontinued and completed randomized clinical trials that compared convalescent plasma with placebo or no treatment in COVID-19 patients, regardless of setting or treatment schedule. Aggregated mortality data were extracted from publications or provided by investigators of unpublished trials and combined using the Hartung-Knapp-Sidik-Jonkman random effects model. We investigated the contribution of unpublished trials to the overall evidence. RESULTS: A total of 16,477 patients were included in 33 trials (20 unpublished with 3190 patients, 13 published with 13,287 patients). 32 trials enrolled only hospitalized patients (including 3 with only intensive care unit patients). Risk of bias was low for 29/33 trials. Of 8495 patients who received convalescent plasma, 1997 died (23%), and of 7982 control patients, 1952 died (24%). The combined risk ratio for all-cause mortality was 0.97 (95% confidence interval: 0.92; 1.02) with between-study heterogeneity not beyond chance (I2 = 0%). The RECOVERY trial had 69.8% and the unpublished evidence 25.3% of the weight in the meta-analysis. CONCLUSIONS: Convalescent plasma treatment of patients with COVID-19 did not reduce all-cause mortality. These results provide strong evidence that convalescent plasma treatment for patients with COVID-19 should not be used outside of randomized trials. Evidence synthesis from collaborations among trial investigators can inform both evidence generation and evidence application in patient care

Henry Ford Health System Scholarly Commons

Repositorio Institucional Universidad de Granada

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Spiral - Imperial College Digital Repository

Open Repository and Bibliography - Liège

Publikationer från Uppsala Universitet

Ghent University Academic Bibliography

PubMed Central

Copenhagen University Research Information System

VBN

Apollo (Cambridge)

King's Research Portal

University of Melbourne Institutional Repository

Recommended from our members

Causal and associational language in observational health research: a systematic evaluation

Author: Aguirre Ariadne
Alsalti Taym
Alshihayb Talal
Antonietti Alberto
Arah Onyebuchi
Au Eric
Axfors Cathrine
Baglini Rebekah
Booman Anna
Calvache Jose
Chatton Arthur
Dabravolskaj Julia
Do Stephanie
Dufour Mi-Suk
Dunleavy Daniel
Evans Thomas
Fox Matthew
Haber Noah
Hoopsick Rachel
Howcutt Sarah
Judd Nicholas
Kelson Mark
Khalatbari-Soltani Saman
Khan Palwasha
Lam Sze
Leyrat Clémence
McLinden Taylor
Meyerowitz-Katz Gideon
Murray Eleanor
O'Donoghue Ashley
Odu Nnaemeka
Parra Camila
Peña Sebastián
Pilleron Sophie
Riederer Emily
Rodriguez-Molina Daloha
Rohrer Julia
Salvia Meg
Schmid Ian
Schoenegger Philipp
Seiler Jessie
Simmons Alison
Steriu Andreea
Stuart Elizabeth
Suresh Shashank
Takashima Mari
Tennant Peter
Twardowski Sarah
Wieten Sarah
Publication venue: Bloomberg School of Public Health - Oxford University Press
Publication date: 04/08/2022
Field of study

We estimated the degree to which language used in the high profile medical/public health/epidemiology literature implied causality using language linking exposures to outcomes and action recommendations; examined disconnects between language and recommendations; identified the most common linking phrases; and estimated how strongly linking phrases imply causality. We searched and screened for 1,170 articles from 18 high-profile journals (65 per journal) published from 2010-2019. Based on written framing and systematic guidance, three reviewers rated the degree of causality implied in abstracts and full text for exposure/outcome linking language and action recommendations. Reviewers rated the causal implication of exposure/outcome linking language as None (no causal implication) in 13.8%, Weak 34.2%, Moderate 33.2%, and Strong 18.7% of abstracts. The implied causality of action recommendations was higher than the implied causality of linking sentences for 44.5% or commensurate for 40.3% of articles. The most common linking word in abstracts was “associate” (45.7%). Reviewer’s ratings of linking word roots were highly heterogeneous; over half of reviewers rated “association” as having at least some causal implication. This research undercuts the assumption that avoiding “causal” words leads to clarity of interpretation in medical research

Greenwich Academic Literature Archive

LSHTM Research Online

eScholarship - University of California

University of St. Andrews - Pure

St Andrews Research Repository