9 research outputs found
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
The rapid advancement of artificial intelligence (AI) systems suggests that
artificial general intelligence (AGI) systems may soon arrive. Many researchers
are concerned that AIs and AGIs will harm humans via intentional misuse
(AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents,
there is an increasing effort focused on developing algorithms and paradigms
that ensure AI systems are aligned to what humans intend, e.g. AI systems that
yield actions or recommendations that humans might judge as consistent with
their intentions and goals. Here we argue that alignment to human intent is
insufficient for safe AI systems and that preservation of long-term agency of
humans may be a more robust standard, and one that needs to be separated
explicitly and a priori during optimization. We argue that AI systems can
reshape human intention and discuss the lack of biological and psychological
mechanisms that protect humans from loss of agency. We provide the first formal
definition of agency-preserving AI-human interactions which focuses on
forward-looking agency evaluations and argue that AI systems - not humans -
must be increasingly tasked with making these evaluations. We show how agency
loss can occur in simple environments containing embedded agents that use
temporal-difference learning to make action recommendations. Finally, we
propose a new area of research called "agency foundations" and pose four
initial topics designed to improve our understanding of agency in AI-human
interactions: benevolent game theory, algorithmic foundations of human rights,
mechanistic interpretability of agency representation in neural-networks and
reinforcement learning from internal states
The Shutdown Problem: Three Theorems
I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that a small number of innocuous-seeming conditions together preclude shutdownability. Agents with preferences satisfying these conditions will try to prevent or cause the pressing of the shutdown button even in cases where it’s costly to do so. And patience trades off against shutdownability: the more patient an agent, the greater the costs that agent is willing to incur to manipulate the shutdown button. I end by noting that these theorems can guide our search for solutions
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
We are currently unable to specify human goals and societal values in a way
that reliably directs AI behavior. Law-making and legal interpretation form a
computational engine that converts opaque human values into legible directives.
"Law Informs Code" is the research agenda embedding legal knowledge and
reasoning in AI. Similar to how parties to a legal contract cannot foresee
every potential contingency of their future relationship, and legislators
cannot predict all the circumstances under which their proposed bills will be
applied, we cannot ex ante specify rules that provably direct good AI behavior.
Legal theory and practice have developed arrays of tools to address these
specification problems. For instance, legal standards allow humans to develop
shared understandings and adapt them to novel situations. In contrast to more
prosaic uses of the law (e.g., as a deterrent of bad behavior through the
threat of sanction), leveraged as an expression of how humans communicate their
goals, and what society values, Law Informs Code.
We describe how data generated by legal processes (methods of law-making,
statutory interpretation, contract drafting, applications of legal standards,
legal reasoning, etc.) can facilitate the robust specification of inherently
vague human goals. This increases human-AI alignment and the local usefulness
of AI. Toward society-AI alignment, we present a framework for understanding
law as the applied philosophy of multi-agent alignment. Although law is partly
a reflection of historically contingent political power - and thus not a
perfect aggregation of citizen preferences - if properly parsed, its
distillation offers the most legitimate computational comprehension of societal
values available. If law eventually informs powerful AI, engaging in the
deliberative political process to improve law takes on even more meaning.Comment: Forthcoming in Northwestern Journal of Technology and Intellectual
Property, Volume 2