Search CORE

8 research outputs found

Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?

Author: Anderljung Markus
Hazell Julian
Publication venue
Publication date: 29/03/2023
Field of study

Artificial intelligence (AI) systems will increasingly be used to cause harm as they grow more capable. In fact, AI systems are already starting to be used to automate fraudulent activities, violate human rights, create harmful fake images, and identify dangerous toxins. To prevent some misuses of AI, we argue that targeted interventions on certain capabilities will be warranted. These restrictions may include controlling who can access certain types of AI models, what they can be used for, whether outputs are filtered or can be traced back to their user, and the resources needed to develop them. We also contend that some restrictions on non-AI capabilities needed to cause harm will be required. Though capability restrictions risk reducing use more than misuse (facing an unfavorable Misuse-Use Tradeoff), we argue that interventions on capabilities are warranted when other interventions are insufficient, the potential harm from misuse is high, and there are targeted ways to intervene on capabilities. We provide a taxonomy of interventions that can reduce AI misuse, focusing on the specific steps required for a misuse to cause harm (the Misuse Chain), and a framework to determine if an intervention is warranted. We apply this reasoning to three examples: predicting novel toxins, creating harmful images, and automating spear phishing campaigns.Comment: 14 pages, 1 figur

arXiv.org e-Print Archive

Social and Governance Implications of Improved Data Efficiency

Author: Anderljung Markus
Dafoe Allan
Tucker Aaron D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/01/2020
Field of study

Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency will erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intuition is only partially correct: data efficiency makes it easier to create ML applications, but large AI firms may have more to gain from higher performing AI systems. Further, we find that the effect on privacy, data markets, robustness, and misuse are complex. For example, while it seems intuitive that misuse risk would increase along with data efficiency -- as more actors gain access to any level of capability -- the net effect crucially depends on how much defensive measures are improved. More investigation into data efficiency, as well as research into the "AI production function", will be key to understanding the development of the AI industry and its societal impacts.Comment: 7 pages, 2 figures, accepted to Artificial Intelligence Ethics and Society 202

arXiv.org e-Print Archive

Crossref

Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework

Author: Anderljung Markus
Bluemke Emma
Bucknall Benjamin
Chowdhury Rumman
O'Brien Joe
Schuett Jonas
Smith Everett Thornton
Soder Lisa
Strahm Lacey
Trager Robert
Publication venue
Publication date: 15/11/2023
Field of study

With the increasing integration of frontier large language models (LLMs) into society and the economy, decisions related to their training, deployment, and use have far-reaching implications. These decisions should not be left solely in the hands of frontier LLM developers. LLM users, civil society and policymakers need trustworthy sources of information to steer such decisions for the better. Involving outside actors in the evaluation of these systems - what we term 'external scrutiny' - via red-teaming, auditing, and external researcher access, offers a solution. Though there are encouraging signs of increasing external scrutiny of frontier LLMs, its success is not assured. In this paper, we survey six requirements for effective external scrutiny of frontier AI systems and organize them under the ASPIRE framework: Access, Searching attitude, Proportionality to the risks, Independence, Resources, and Expertise. We then illustrate how external scrutiny might function throughout the AI lifecycle and offer recommendations to policymakers.Comment: Accepted to Workshop on Socially Responsible Language Modelling Research (SoLaR) at the 2023 Conference on Neural Information Processing Systems (NeurIPS 2023

arXiv.org e-Print Archive

Model evaluation for extreme risks

Author: Anderljung Markus
Avin Shahar
Bengio Yoshua
Bolina Vijay
Christiano Paul
Clark Jack
Dafoe Allan
Farquhar Sebastian
Gabriel Iason
Garfinkel Ben
Hawkins Will
Ho Lewis
Kim Been
Kokotajlo Daniel
Kolt Noam
Leung Jade
Marchal Nahema
Phuong Mary
Shevlane Toby
Siddarth Divya
Whittlestone Jess
Publication venue
Publication date: 24/05/2023
Field of study

Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security

arXiv.org e-Print Archive

Frontier AI Regulation: Managing Emerging Risks to Public Safety

Author: Anderljung Markus
Avin Shahar
Barnhart Joslyn
Brundage Miles
Bullock Justin
Cass-Beggs Duncan
Chang Ben
Collins Tantum
Fist Tim
Hadfield Gillian
Hayes Alan
Ho Lewis
Hooker Sara
Horvitz Eric
Kolt Noam
Korinek Anton
Leung Jade
O'Keefe Cullen
Schuett Jonas
Shavit Yonadav
Siddarth Divya
Trager Robert
Whittlestone Jess
Wolf Kevin
Publication venue
Publication date: 04/09/2023
Field of study

Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model's capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development.Comment: Update July 11th: - Added missing footnote back in. - Adjusted author order (mistakenly non-alphabetical among the first 6 authors) and adjusted affiliations (Jess Whittlestone's affiliation was mistagged and Gillian Hadfield had SRI added to her affiliations) Updated September 4th: Various typo

arXiv.org e-Print Archive

Filling gaps in trustworthy development of AI.

Author: Anderljung Markus
Avin Shahar
Belfield Haydn
Brundage Miles
Krawczuk Igor
Krueger David
Krueger Gretchen
Lebensold Jonathan
Maharaj Tegan
Wang Jasmine
Weller Adrian
Zilberman Noa
Publication venue: Science
Publication date: 06/12/2021
Field of study

Incident sharing, auditing, and other concrete mechanisms could help verify the trustworthiness of actors

arXiv.org e-Print Archive

Apollo (Cambridge)

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Author: Anderljung Markus
Askell Amanda
Avin Shahar
Barnes Elizabeth
Belfield Haydn
Bengio Yoshua
Besiroglu Tamay
Bluemke Emma
Brundage Miles
Cammarota Rosario
Carugati Federica
Clark Jack
Dafoe Allan
de Haas Sarah
Dyer Lisa
Eckersley Peter
Flynn Carrick
Fong Ruth
Gilbert Thomas Krendl
Graham Logan
Hadfield Gillian
Henderson Peter
Herbert-Voss Ariel
Hooker Sara
Ingerman Alex
Johnson Maritza
Kagan Rebecca
Khan Saif
Khlaaf Heidy
Koh Pang Wei
Koren Mark
Krawczuk Igor
Kroeger Frens
Krueger David
Krueger Gretchen
Laurie Ben
Lebensold Jonathan
Leung Jade
Lohn Andrew
Maharaj Tegan
Martin Bianca
O’Keefe Cullen
Prunkl Carina
Rasser Martijn
Rubinovitz JB
Ryffel Théo
Sastry Girish
Scharre Paul
Seger Elizabeth
Sodhani Shagun
Stix Charlotte
Toner Helen
Trask Andrew
Tse Brian
Wang Jasmine
Weller Adrian
Yang Jingying
Zilberman Noa
Ó hÉigeartaigh Seán
Publication venue: 'Center for Open Science'
Publication date: 15/04/2020
Field of study

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms

arXiv.org e-Print Archive

Pure OAI Repository

INRIA a CCSD electronic archive server

Coventry University Pure Portal