8 research outputs found
Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?
Artificial intelligence (AI) systems will increasingly be used to cause harm
as they grow more capable. In fact, AI systems are already starting to be used
to automate fraudulent activities, violate human rights, create harmful fake
images, and identify dangerous toxins. To prevent some misuses of AI, we argue
that targeted interventions on certain capabilities will be warranted. These
restrictions may include controlling who can access certain types of AI models,
what they can be used for, whether outputs are filtered or can be traced back
to their user, and the resources needed to develop them. We also contend that
some restrictions on non-AI capabilities needed to cause harm will be required.
Though capability restrictions risk reducing use more than misuse (facing an
unfavorable Misuse-Use Tradeoff), we argue that interventions on capabilities
are warranted when other interventions are insufficient, the potential harm
from misuse is high, and there are targeted ways to intervene on capabilities.
We provide a taxonomy of interventions that can reduce AI misuse, focusing on
the specific steps required for a misuse to cause harm (the Misuse Chain), and
a framework to determine if an intervention is warranted. We apply this
reasoning to three examples: predicting novel toxins, creating harmful images,
and automating spear phishing campaigns.Comment: 14 pages, 1 figur
Social and Governance Implications of Improved Data Efficiency
Many researchers work on improving the data efficiency of machine learning.
What would happen if they succeed? This paper explores the social-economic
impact of increased data efficiency. Specifically, we examine the intuition
that data efficiency will erode the barriers to entry protecting incumbent
data-rich AI firms, exposing them to more competition from data-poor firms. We
find that this intuition is only partially correct: data efficiency makes it
easier to create ML applications, but large AI firms may have more to gain from
higher performing AI systems. Further, we find that the effect on privacy, data
markets, robustness, and misuse are complex. For example, while it seems
intuitive that misuse risk would increase along with data efficiency -- as more
actors gain access to any level of capability -- the net effect crucially
depends on how much defensive measures are improved. More investigation into
data efficiency, as well as research into the "AI production function", will be
key to understanding the development of the AI industry and its societal
impacts.Comment: 7 pages, 2 figures, accepted to Artificial Intelligence Ethics and
Society 202
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework
With the increasing integration of frontier large language models (LLMs) into
society and the economy, decisions related to their training, deployment, and
use have far-reaching implications. These decisions should not be left solely
in the hands of frontier LLM developers. LLM users, civil society and
policymakers need trustworthy sources of information to steer such decisions
for the better. Involving outside actors in the evaluation of these systems -
what we term 'external scrutiny' - via red-teaming, auditing, and external
researcher access, offers a solution. Though there are encouraging signs of
increasing external scrutiny of frontier LLMs, its success is not assured. In
this paper, we survey six requirements for effective external scrutiny of
frontier AI systems and organize them under the ASPIRE framework: Access,
Searching attitude, Proportionality to the risks, Independence, Resources, and
Expertise. We then illustrate how external scrutiny might function throughout
the AI lifecycle and offer recommendations to policymakers.Comment: Accepted to Workshop on Socially Responsible Language Modelling
Research (SoLaR) at the 2023 Conference on Neural Information Processing
Systems (NeurIPS 2023
Model evaluation for extreme risks
Current approaches to building general-purpose AI systems tend to produce
systems with both beneficial and harmful capabilities. Further progress in AI
development could lead to capabilities that pose extreme risks, such as
offensive cyber capabilities or strong manipulation skills. We explain why
model evaluation is critical for addressing extreme risks. Developers must be
able to identify dangerous capabilities (through "dangerous capability
evaluations") and the propensity of models to apply their capabilities for harm
(through "alignment evaluations"). These evaluations will become critical for
keeping policymakers and other stakeholders informed, and for making
responsible decisions about model training, deployment, and security
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Advanced AI models hold the promise of tremendous benefits for humanity, but
society needs to proactively manage the accompanying risks. In this paper, we
focus on what we term "frontier AI" models: highly capable foundation models
that could possess dangerous capabilities sufficient to pose severe risks to
public safety. Frontier AI models pose a distinct regulatory challenge:
dangerous capabilities can arise unexpectedly; it is difficult to robustly
prevent a deployed model from being misused; and, it is difficult to stop a
model's capabilities from proliferating broadly. To address these challenges,
at least three building blocks for the regulation of frontier models are
needed: (1) standard-setting processes to identify appropriate requirements for
frontier AI developers, (2) registration and reporting requirements to provide
regulators with visibility into frontier AI development processes, and (3)
mechanisms to ensure compliance with safety standards for the development and
deployment of frontier AI models. Industry self-regulation is an important
first step. However, wider societal discussions and government intervention
will be needed to create standards and to ensure compliance with them. We
consider several options to this end, including granting enforcement powers to
supervisory authorities and licensure regimes for frontier AI models. Finally,
we propose an initial set of safety standards. These include conducting
pre-deployment risk assessments; external scrutiny of model behavior; using
risk assessments to inform deployment decisions; and monitoring and responding
to new information about model capabilities and uses post-deployment. We hope
this discussion contributes to the broader conversation on how to balance
public safety risks and innovation benefits from advances at the frontier of AI
development.Comment: Update July 11th: - Added missing footnote back in. - Adjusted author
order (mistakenly non-alphabetical among the first 6 authors) and adjusted
affiliations (Jess Whittlestone's affiliation was mistagged and Gillian
Hadfield had SRI added to her affiliations) Updated September 4th: Various
typo
Filling gaps in trustworthy development of AI.
Incident sharing, auditing, and other concrete mechanisms could help verify the trustworthiness of actors
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
With the recent wave of progress in artificial intelligence (AI) has come a
growing awareness of the large-scale impacts of AI systems, and recognition
that existing regulations and norms in industry and academia are insufficient
to ensure responsible AI development. In order for AI developers to earn trust
from system users, customers, civil society, governments, and other
stakeholders that they are building AI responsibly, they will need to make
verifiable claims to which they can be held accountable. Those outside of a
given organization also need effective means of scrutinizing such claims. This
report suggests various steps that different stakeholders can take to improve
the verifiability of claims made about AI systems and their associated
development processes, with a focus on providing evidence about the safety,
security, fairness, and privacy protection of AI systems. We analyze ten
mechanisms for this purpose--spanning institutions, software, and hardware--and
make recommendations aimed at implementing, exploring, or improving those
mechanisms