190 research outputs found

    PMH40: ACCESS TO NEW MEDICATIONS TO TREAT SCHIZOPHRENIA

    Get PDF

    WPE5: ANALYSES OF OUTCOME DOMAINS IN SCHIZOPHRENIA: METHODOLOGIES AND RESULTS FROM THE SCHIZOPHRENIA CARE AND ASSESSMENT PROGRAM (SCAP)

    Get PDF

    Debate Helps Supervise Unreliable Experts

    Full text link
    As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can't tell the difference between the two on their own? In this work, we show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth. We collect a dataset of human-written debates on hard reading comprehension questions where the judge has not read the source passage, only ever seeing expert arguments and short quotes selectively revealed by 'expert' debaters who have access to the passage. In our debates, one expert argues for the correct answer, and the other for an incorrect answer. Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%. Debates are also more efficient, being 68% of the length of consultancies. By comparing human to AI debaters, we find evidence that with more skilled (in this case, human) debaters, the performance of debate goes up but the performance of consultancy goes down. Our error analysis also supports this trend, with 46% of errors in human debate attributable to mistakes by the honest debater (which should go away with increased skill); whereas 52% of errors in human consultancy are due to debaters obfuscating the relevant evidence from the judge (which should become worse with increased skill). Overall, these results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.Comment: 84 pages, 13 footnotes, 5 figures, 4 tables, 28 debate transcripts; data and code at https://github.com/julianmichael/debate/tree/2023-nyu-experiment

    GPQA: A Graduate-Level Google-Proof Q&A Benchmark

    Full text link
    We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.Comment: 28 pages, 5 figures, 7 table

    Predicting hospital admission and discharge with symptom or function scores in patients with schizophrenia: pooled analysis of a clinical trial extension

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The purpose of this analysis was to evaluate relationships between hospital admission or discharge and scores for symptom or functioning in patients with schizophrenia.</p> <p>Methods</p> <p>Data were from three 52-week open-label extensions of the double-blind pivotal trials of paliperidone extended-release (ER). Symptoms and patient function were measured every 4 weeks using the Personal and Social Performance (PSP) scale and the Positive and Negative Syndrome Scale (PANSS). The intent-to-treat analysis set was defined as open-label patients who had at least one post-baseline PSP and PANSS measurement. Time until first hospitalization was evaluated using the Cox proportional hazard model with categorical time-dependent measures for the PSP (1 to 30, 31 to 70, 71 to 100) or PANSS (< 75, ≥ 75 to < 95, ≥ 95), as well as age, gender, schizophrenia duration, and country. Similar analyses were performed for time to discharge.</p> <p>Results</p> <p>Of the 1,077 enrolled patients, 1,028 (95.5%) met study criteria; of these, 382 (37.2%) were hospitalized at open-label baseline. Compared with patients with PSP ≥ 71 group, the hazard for new hospitalization was 8.351 times greater (<it>P </it>= 0.0001) for patients with the poorest functioning (PSP 1 to 30) and 1.977 times greater (<it>P </it>= 0.0295) for patients with PSP of 31-70 compared to the ≥ 71 group. The hazard for new hospitalization was 5.457 times greater (<it>P </it>< 0.0001) for patients PANSS ≥ 95 and 2.316 times greater (<it>P </it>= 0.0027) for the ≥ 75 to < 95 group compared with the < 75 group. For patients hospitalized at baseline, the PANSS ≥ 95 patients had a discharge hazard that was 0.456 times lower than for the < 75 patients (<it>P </it>< 0.0001). The hazard for discharge was 0.646 times lower (<it>P = </it>0.0012) for the PANSS ≥ 75 to < 95 group compared with the < 75 group. A patient's country was a significant predictor variable, with US patients being admitted and discharged faster.</p> <p>Conclusions</p> <p>Better functioning or being less symptomatic is associated with reduced risk for hospitalization and greater chance for early discharge. Treatments or programs that reduce symptoms or improve function decrease the risk of hospitalization in community patients or increase the chance of discharge for hospitalized patients.</p

    Cold atmospheric plasma decontamination of SARS-CoV-2 bioaerosols

    Get PDF
    Bioaerosols (aerosolized particles with biological origin) are strongly suspected to play a significant role in the transmission of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), especially in closed indoor environments. Thus, control technologies capable of effectively inactivating bioaerosols are urgently needed. In this regard, cold atmospheric pressure plasma (CAP) can represent a suitable option, thanks to its ability to produce reactive species, which can exert antimicrobial action. In this study, results; on the total inactivation of SARS-CoV-2 contained in bioaerosols treated using CAP generated in air are reported, demonstrating the possible use of CAP systems for the control of SARS-CoV-2 diffusion through bioaerosols

    Diagnosing Clostridioides difficile infections with molecular diagnostics: multicenter evaluation of revogene C. difficile assay

    Get PDF
    Clostridioides difficile infections are a significant threat to our healthcare system, and rapid and accurate diagnostics are crucial to implement the necessary infection prevention and control measurements. Nucleic acid amplification tests are such reliable diagnostic tools for the detection of toxigenic Clostridioides difficile strains directly from stool specimens. In this multicenter evaluation, we determined the performance of the revogene C. difficile assay. The analysis was conducted on prospective stool specimens collected from six different sites in Europe. The performance of the revogene C. difficile assay was compared to the different routine diagnostic methods and, for a subset of the specimens, against toxigenic culture. In total, 2621 valid stool specimens were tested, and the revogene C. difficile assay displayed a sensitivity/specificity of 97.1% [93.3-99.0] and 98.9% [98.5-99.3] for identification of Clostridioides difficile infection. Discrepancy analysis using additional methods improved this performance to 98.8% [95.8-99.9] and 99.6% [99.2-99.8], respectively. In comparison to toxigenic culture, the revogene C. difficile assay displayed a sensitivity/specificity of 93.0% [86.1-97.1] and 99.5% [98.7-99.9], respectively. These results indicate that the revogene C. difficile assay is a robust and reliable aid in the diagnosis of Clostridioides difficile infections.This article is freely available via Open Access. Click on the Publisher URL to access it via the publisher's site.This study was supported by grants from GenePOC, now part of Meridian Biosciences.published version, accepted versio

    Global variations and time trends in the prevalence of childhood myopia, a systematic review and quantitative meta-analysis: implications for aetiology and early prevention.

    Get PDF
    The aim of this review was to quantify the global variation in childhood myopia prevalence over time taking account of demographic and study design factors. A systematic review identified population-based surveys with estimates of childhood myopia prevalence published by February 2015. Multilevel binomial logistic regression of log odds of myopia was used to examine the association with age, gender, urban versus rural setting and survey year, among populations of different ethnic origins, adjusting for study design factors. 143 published articles (42 countries, 374 349 subjects aged 1-18 years, 74 847 myopia cases) were included. Increase in myopia prevalence with age varied by ethnicity. East Asians showed the highest prevalence, reaching 69% (95% credible intervals (CrI) 61% to 77%) at 15 years of age (86% among Singaporean-Chinese). Blacks in Africa had the lowest prevalence; 5.5% at 15 years (95% CrI 3% to 9%). Time trends in myopia prevalence over the last decade were small in whites, increased by 23% in East Asians, with a weaker increase among South Asians. Children from urban environments have 2.6 times the odds of myopia compared with those from rural environments. In whites and East Asians sex differences emerge at about 9 years of age; by late adolescence girls are twice as likely as boys to be myopic. Marked ethnic differences in age-specific prevalence of myopia exist. Rapid increases in myopia prevalence over time, particularly in East Asians, combined with a universally higher risk of myopia in urban settings, suggest that environmental factors play an important role in myopia development, which may offer scope for prevention
    • …
    corecore