33 research outputs found

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Adding 6 months of androgen deprivation therapy to postoperative radiotherapy for prostate cancer: a comparison of short-course versus no androgen deprivation therapy in the RADICALS-HD randomised controlled trial

    Get PDF
    Background Previous evidence indicates that adjuvant, short-course androgen deprivation therapy (ADT) improves metastasis-free survival when given with primary radiotherapy for intermediate-risk and high-risk localised prostate cancer. However, the value of ADT with postoperative radiotherapy after radical prostatectomy is unclear. Methods RADICALS-HD was an international randomised controlled trial to test the efficacy of ADT used in combination with postoperative radiotherapy for prostate cancer. Key eligibility criteria were indication for radiotherapy after radical prostatectomy for prostate cancer, prostate-specific antigen less than 5 ng/mL, absence of metastatic disease, and written consent. Participants were randomly assigned (1:1) to radiotherapy alone (no ADT) or radiotherapy with 6 months of ADT (short-course ADT), using monthly subcutaneous gonadotropin-releasing hormone analogue injections, daily oral bicalutamide monotherapy 150 mg, or monthly subcutaneous degarelix. Randomisation was done centrally through minimisation with a random element, stratified by Gleason score, positive margins, radiotherapy timing, planned radiotherapy schedule, and planned type of ADT, in a computerised system. The allocated treatment was not masked. The primary outcome measure was metastasis-free survival, defined as distant metastasis arising from prostate cancer or death from any cause. Standard survival analysis methods were used, accounting for randomisation stratification factors. The trial had 80% power with two-sided α of 5% to detect an absolute increase in 10-year metastasis-free survival from 80% to 86% (hazard ratio [HR] 0·67). Analyses followed the intention-to-treat principle. The trial is registered with the ISRCTN registry, ISRCTN40814031, and ClinicalTrials.gov, NCT00541047. Findings Between Nov 22, 2007, and June 29, 2015, 1480 patients (median age 66 years [IQR 61–69]) were randomly assigned to receive no ADT (n=737) or short-course ADT (n=743) in addition to postoperative radiotherapy at 121 centres in Canada, Denmark, Ireland, and the UK. With a median follow-up of 9·0 years (IQR 7·1–10·1), metastasis-free survival events were reported for 268 participants (142 in the no ADT group and 126 in the short-course ADT group; HR 0·886 [95% CI 0·688–1·140], p=0·35). 10-year metastasis-free survival was 79·2% (95% CI 75·4–82·5) in the no ADT group and 80·4% (76·6–83·6) in the short-course ADT group. Toxicity of grade 3 or higher was reported for 121 (17%) of 737 participants in the no ADT group and 100 (14%) of 743 in the short-course ADT group (p=0·15), with no treatment-related deaths. Interpretation Metastatic disease is uncommon following postoperative bed radiotherapy after radical prostatectomy. Adding 6 months of ADT to this radiotherapy did not improve metastasis-free survival compared with no ADT. These findings do not support the use of short-course ADT with postoperative radiotherapy in this patient population

    Duration of androgen deprivation therapy with postoperative radiotherapy for prostate cancer: a comparison of long-course versus short-course androgen deprivation therapy in the RADICALS-HD randomised trial

    Get PDF
    Background Previous evidence supports androgen deprivation therapy (ADT) with primary radiotherapy as initial treatment for intermediate-risk and high-risk localised prostate cancer. However, the use and optimal duration of ADT with postoperative radiotherapy after radical prostatectomy remains uncertain. Methods RADICALS-HD was a randomised controlled trial of ADT duration within the RADICALS protocol. Here, we report on the comparison of short-course versus long-course ADT. Key eligibility criteria were indication for radiotherapy after previous radical prostatectomy for prostate cancer, prostate-specific antigen less than 5 ng/mL, absence of metastatic disease, and written consent. Participants were randomly assigned (1:1) to add 6 months of ADT (short-course ADT) or 24 months of ADT (long-course ADT) to radiotherapy, using subcutaneous gonadotrophin-releasing hormone analogue (monthly in the short-course ADT group and 3-monthly in the long-course ADT group), daily oral bicalutamide monotherapy 150 mg, or monthly subcutaneous degarelix. Randomisation was done centrally through minimisation with a random element, stratified by Gleason score, positive margins, radiotherapy timing, planned radiotherapy schedule, and planned type of ADT, in a computerised system. The allocated treatment was not masked. The primary outcome measure was metastasis-free survival, defined as metastasis arising from prostate cancer or death from any cause. The comparison had more than 80% power with two-sided α of 5% to detect an absolute increase in 10-year metastasis-free survival from 75% to 81% (hazard ratio [HR] 0·72). Standard time-to-event analyses were used. Analyses followed intention-to-treat principle. The trial is registered with the ISRCTN registry, ISRCTN40814031, and ClinicalTrials.gov , NCT00541047 . Findings Between Jan 30, 2008, and July 7, 2015, 1523 patients (median age 65 years, IQR 60–69) were randomly assigned to receive short-course ADT (n=761) or long-course ADT (n=762) in addition to postoperative radiotherapy at 138 centres in Canada, Denmark, Ireland, and the UK. With a median follow-up of 8·9 years (7·0–10·0), 313 metastasis-free survival events were reported overall (174 in the short-course ADT group and 139 in the long-course ADT group; HR 0·773 [95% CI 0·612–0·975]; p=0·029). 10-year metastasis-free survival was 71·9% (95% CI 67·6–75·7) in the short-course ADT group and 78·1% (74·2–81·5) in the long-course ADT group. Toxicity of grade 3 or higher was reported for 105 (14%) of 753 participants in the short-course ADT group and 142 (19%) of 757 participants in the long-course ADT group (p=0·025), with no treatment-related deaths. Interpretation Compared with adding 6 months of ADT, adding 24 months of ADT improved metastasis-free survival in people receiving postoperative radiotherapy. For individuals who can accept the additional duration of adverse effects, long-course ADT should be offered with postoperative radiotherapy. Funding Cancer Research UK, UK Research and Innovation (formerly Medical Research Council), and Canadian Cancer Society

    Correlation Between the Spirit Bike Maximal Power Output and Other Lower Extremity Power Output Tests

    Get PDF
    The assessment of a patient’s lower extremity function is important for physical therapists to make clinical judgements about the subject’s mobility and physical capabilities. For physical therapists to accurately assess a patient’s lower extremity function, clinicians must utilize the most appropriate tests, evaluation techniques, and/or tools. It is not clear that single leg hop tests will provide the most accurate assessment of lower extremity function for patients with hip, knee, ankle, and or foot biomechanical dysfunctions, as in some severe cases, these tests may even be contraindicated

    Cyclooctatetraene: a bioactive cubane paradigm complement

    No full text
    Cubane was recently validated as a phenyl ring (bio)isostere, but highly strained caged carbocyclic systems lack π character, which is often critical for mediating key biological interactions. This electronic property restriction associated with cubane has been addressed herein with cyclooctatetraene (COT), using known pharmaceutical and agrochemical compounds as templates. COT either outperformed or matched cubane in multiple cases suggesting that versatile complementarity exists between the two systems for enhanced bioactive molecule discovery

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark
    corecore