10 research outputs found

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    ILC Reference Design Report Volume 1 - Executive Summary

    No full text
    The International Linear Collider (ILC) is a 200-500 GeV center-of-mass high-luminosity linear electron-positron collider, based on 1.3 GHz superconducting radio-frequency (SCRF) accelerating cavities. The ILC has a total footprint of about 31 km and is designed for a peak luminosity of 2x10^34 cm^-2s^-1. This report is the Executive Summary (Volume I) of the four volume Reference Design Report. It gives an overview of the physics at the ILC, the accelerator design and value estimate, the detector concepts, and the next steps towards project realization.The International Linear Collider (ILC) is a 200-500 GeV center-of-mass high-luminosity linear electron-positron collider, based on 1.3 GHz superconducting radio-frequency (SCRF) accelerating cavities. The ILC has a total footprint of about 31 km and is designed for a peak luminosity of 2x10^34 cm^-2s^-1. This report is the Executive Summary (Volume I) of the four volume Reference Design Report. It gives an overview of the physics at the ILC, the accelerator design and value estimate, the detector concepts, and the next steps towards project realization

    ILC Reference Design Report Volume 4 - Detectors

    No full text
    This report, Volume IV of the International Linear Collider Reference Design Report, describes the detectors which will record and measure the charged and neutral particles produced in the ILC's high energy e+e- collisions. The physics of the ILC, and the environment of the machine-detector interface, pose new challenges for detector design. Several conceptual designs for the detector promise the needed performance, and ongoing detector R&D is addressing the outstanding technological issues. Two such detectors, operating in push-pull mode, perfectly instrument the ILC interaction region, and access the full potential of ILC physics.This report, Volume IV of the International Linear Collider Reference Design Report, describes the detectors which will record and measure the charged and neutral particles produced in the ILC's high energy e+e- collisions. The physics of the ILC, and the environment of the machine-detector interface, pose new challenges for detector design. Several conceptual designs for the detector promise the needed performance, and ongoing detector R&D is addressing the outstanding technological issues. Two such detectors, operating in push-pull mode, perfectly instrument the ILC interaction region, and access the full potential of ILC physics

    ILC Reference Design Report Volume 3 - Accelerator

    No full text
    The International Linear Collider (ILC) is a 200-500 GeV center-of-mass high-luminosity linear electron-positron collider, based on 1.3 GHz superconducting radio-frequency (SCRF) accelerating cavities. The ILC has a total footprint of about 31 km and is designed for a peak luminosity of 2x10^34 cm^-2 s^-1. The complex includes a polarized electron source, an undulator-based positron source, two 6.7 km circumference damping rings, two-stage bunch compressors, two 11 km long main linacs and a 4.5 km long beam delivery system. This report is Volume III (Accelerator) of the four volume Reference Design Report, which describes the design and cost of the ILC.The International Linear Collider (ILC) is a 200-500 GeV center-of-mass high-luminosity linear electron-positron collider, based on 1.3 GHz superconducting radio-frequency (SCRF) accelerating cavities. The ILC has a total footprint of about 31 km and is designed for a peak luminosity of 2x10^34 cm^-2 s^-1. The complex includes a polarized electron source, an undulator-based positron source, two 6.7 km circumference damping rings, two-stage bunch compressors, two 11 km long main linacs and a 4.5 km long beam delivery system. This report is Volume III (Accelerator) of the four volume Reference Design Report, which describes the design and cost of the ILC

    International Linear Collider Reference Design Report Volume 2: PHYSICS AT THE ILC

    No full text
    This article reviews the physics case for the ILC. Baseline running at 500 GeV as well as possible upgrades and options are discussed. The opportunities on Standard Model physics, Higgs physics, Supersymmetry and alternative theories beyond the Standard Model are described.This article reviews the physics case for the ILC. Baseline running at 500 GeV as well as possible upgrades and options are discussed. The opportunities on Standard Model physics, Higgs physics, Supersymmetry and alternative theories beyond the Standard Model are described

    Search for Scalar Diphoton Resonances in the Mass Range 6560065-600 GeV with the ATLAS Detector in pppp Collision Data at s\sqrt{s} = 8 TeVTeV

    No full text
    A search for scalar particles decaying via narrow resonances into two photons in the mass range 65–600 GeV is performed using 20.3fb120.3\text{}\text{}{\mathrm{fb}}^{-1} of s=8TeV\sqrt{s}=8\text{}\text{}\mathrm{TeV} pppp collision data collected with the ATLAS detector at the Large Hadron Collider. The recently discovered Higgs boson is treated as a background. No significant evidence for an additional signal is observed. The results are presented as limits at the 95% confidence level on the production cross section of a scalar boson times branching ratio into two photons, in a fiducial volume where the reconstruction efficiency is approximately independent of the event topology. The upper limits set extend over a considerably wider mass range than previous searches
    corecore