12 research outputs found

    Antares completed: First selected results

    Full text link
    In May 2008, the Antares collaboration has completed the construction of the first deep sea neutrino telescope in the Northern hemisphere. Antares is a 3D array of 900 photomultipliers held in the sea by twelve mooring lines anchored at a depth of 2500 m in the Mediterranean Sea 40 km off the southern French coast. The detection principle is based on the observation of Cerenkov light induced by charged particles produced in neutrino interactions in the matter surrounding the detector.Comment: conference proceedin

    Search for neutrinos from Gamma-Ray Bursts with ANTARES

    Full text link
    A method to search for neutrino induced showers from gamma-ray bursts in the ANTARES detector is presented. ANTARES consists of a three-dimensional array of photosensitive devices that measure Cherenkov light induced by charged particles produced by high energy neutrinos interacting in the detector vicinity. The shower channel is complementary to the more commonly used upgoing muon channel. The corresponding detection volume is smaller, but has the advantage of being sensitive to neutrinos of any flavour.Comment: To appear in GRB 2010 Proceedings (AIP Publishing

    Bringing Citations and Usage Metrics Together to Make Data Count

    Get PDF
    Over the last years, many organizations have been working on infrastructure to facilitate sharing and reuse of research data. This means that researchers now have ways of making their data available, but not necessarily incentives to do so. Several Research Data Alliance (RDA) working groups have been working on ways to start measuring activities around research data to provide input for new Data Level Metrics (DLMs). These DLMs are a critical step towards providing researchers with credit for their work. In this paper, we describe the outcomes of the work of the Scholarly Link Exchange (Scholix) working group and the Data Usage Metrics working group. The Scholix working group developed a framework that allows organizations to expose and discover links between articles and datasets, thereby providing an indication of data citations. The Data Usage Metrics group works on a standard for the measurement and display of Data Usage Metrics. Here we explain how publishers and data repositories can contribute to and benefit from these initiatives. Together, these contributions feed into several hubs that enable data repositories to start displaying DLMs. Once these DLMs are available, researchers are in a better position to make their data count and be rewarded for their work

    ROBBIE: Robust Bias Evaluation of Large Generative Language Models

    Full text link
    As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensure equal and equitable treatment of marginalized demographic groups. In this work, our focus is two-fold: (1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs. Out of those 6 metrics, AdvPromptSet and HolisticBiasR are novel datasets proposed in the paper. The comparison of those benchmarks gives us insights about the bias and toxicity of the compared models. Therefore, we explore the frequency of demographic terms in common LLM pre-training corpora and how this may relate to model biases. (2) Mitigation: we conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements. ROBBIE aims to provide insights for practitioners while deploying a model, emphasizing the need to not only measure potential harms, but also understand how they arise by characterizing the data, mitigate harms once found, and balance any trade-offs. We open-source our analysis code in hopes of encouraging broader measurements of bias in future LLMs.Comment: EMNLP 202

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    "I'm sorry to hear that": finding bias in language models with a holistic descriptor dataset

    Full text link
    As language models grow in popularity, their biases across all possible markers of demographic identity should be measured and addressed in order to avoid perpetuating existing societal harms. Many datasets for measuring bias currently exist, but they are restricted in their coverage of demographic axes, and are commonly used with preset bias tests that presuppose which types of biases the models exhibit. In this work, we present a new, more inclusive dataset, HOLISTICBIAS, which consists of nearly 600 descriptor terms across 13 different demographic axes. HOLISTICBIAS was assembled in conversation with experts and community members with lived experience through a participatory process. We use these descriptors combinatorially in a set of bias measurement templates to produce over 450,000 unique sentence prompts, and we use these prompts to explore, identify, and reduce novel forms of bias in several generative models. We demonstrate that our dataset is highly efficacious for measuring previously unmeasurable biases in token likelihoods and generations from language models, as well as in an offensiveness classifier. We will invite additions and amendments to the dataset, and we hope it will help serve as a basis for easy-to-use and more standardized methods for evaluating bias in NLP models

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Deep-Sea Bioluminescence Blooms after Dense Water Formation at the Ocean Surface

    No full text
    The deep ocean is the largest and least known ecosystem on Earth. It hosts numerous pelagic organisms, most of which are able to emit light. Here we present a unique data set consisting of a 2.5-year long record of light emission by deep-sea pelagic organisms, measured from December 2007 to June 2010 at the ANTARES underwater neutrino telescope in the deep NW Mediterranean Sea, jointly with synchronous hydrological records. This is the longest continuous time-series of deep-sea bioluminescence ever recorded. Our record reveals several weeks long, seasonal bioluminescence blooms with light intensity up to two orders of magnitude higher than background values, which correlate to changes in the properties of deep waters. Such changes are triggered by the winter cooling and evaporation experienced by the upper ocean layer in the Gulf of Lion that leads to the formation and subsequent sinking of dense water through a process known as “open-sea convection”. It episodically renews the deep water of the study area and conveys fresh organic matter that fuels the deep ecosystems. Luminous bacteria most likely are the main contributors to the observed deep-sea bioluminescence blooms. Our observations demonstrate a consistent and rapid connection between deep open-sea convection and bathypelagic biological activity, as expressed by bioluminescence. In a setting where dense water formation events are likely to decline under global warming scenarios enhancing ocean stratification, in situ observatories become essential as environmental sentinels for the monitoring and understanding of deep-sea ecosystem shifts

    Deep-Sea Bioluminescence Blooms after Dense Water Formation at the Ocean Surface

    Get PDF
    <p>The deep ocean is the largest and least known ecosystem on Earth. It hosts numerous pelagic organisms, most of which are able to emit light. Here we present a unique data set consisting of a 2.5-year long record of light emission by deep-sea pelagic organisms, measured from December 2007 to June 2010 at the ANTARES underwater neutrino telescope in the deep NW Mediterranean Sea, jointly with synchronous hydrological records. This is the longest continuous time-series of deep-sea bioluminescence ever recorded. Our record reveals several weeks long, seasonal bioluminescence blooms with light intensity up to two orders of magnitude higher than background values, which correlate to changes in the properties of deep waters. Such changes are triggered by the winter cooling and evaporation experienced by the upper ocean layer in the Gulf of Lion that leads to the formation and subsequent sinking of dense water through a process known as "open-sea convection". It episodically renews the deep water of the study area and conveys fresh organic matter that fuels the deep ecosystems. Luminous bacteria most likely are the main contributors to the observed deep-sea bioluminescence blooms. Our observations demonstrate a consistent and rapid connection between deep open-sea convection and bathypelagic biological activity, as expressed by bioluminescence. In a setting where dense water formation events are likely to decline under global warming scenarios enhancing ocean stratification, in situ observatories become essential as environmental sentinels for the monitoring and understanding of deep-sea ecosystem shifts.</p>
    corecore