3 research outputs found
Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning
It is incredibly easy for a system designer to misspecify the objective for
an autonomous system ("robot''), thus motivating the desire to have the robot
learn the objective from human behavior instead. Recent work has suggested that
people have an interest in the robot performing well, and will thus behave
pedagogically, choosing actions that are informative to the robot. In turn,
robots benefit from interpreting the behavior by accounting for this pedagogy.
In this work, we focus on misspecification: we argue that robots might not know
whether people are being pedagogic or literal and that it is important to ask
which assumption is safer to make. We cast objective learning into the more
general form of a common-payoff game between the robot and human, and prove
that in any such game literal interpretation is more robust to
misspecification. Experiments with human data support our theoretical results
and point to the sensitivity of the pedagogic assumption.Comment: Published at UAI 201
Choice Set Misspecification in Reward Inference
Specifying reward functions for robots that operate in environments without a
natural reward signal can be challenging, and incorrectly specified rewards can
incentivise degenerate or dangerous behavior. A promising alternative to
manually specifying reward functions is to enable robots to infer them from
human feedback, like demonstrations or corrections. To interpret this feedback,
robots treat as approximately optimal a choice the person makes from a choice
set, like the set of possible trajectories they could have demonstrated or
possible corrections they could have made. In this work, we introduce the idea
that the choice set itself might be difficult to specify, and analyze choice
set misspecification: what happens as the robot makes incorrect assumptions
about the set of choices from which the human selects their feedback. We
propose a classification of different kinds of choice set misspecification, and
show that these different classes lead to meaningful differences in the
inferred reward and resulting performance. While we would normally expect
misspecification to hurt, we find that certain kinds of misspecification are
neither helpful nor harmful (in expectation). However, in other situations,
misspecification can be extremely harmful, leading the robot to believe the
opposite of what it should believe. We hope our results will allow for better
prediction and response to the effects of misspecification in real-world reward
inference.Comment: Presented at the IJCAI-PRICAI 2020 Workshop on Artificial
Intelligence Safet
AI Research Considerations for Human Existential Safety (ARCHES)
Framed in positive terms, this report examines how technical AI research
might be steered in a manner that is more attentive to humanity's long-term
prospects for survival as a species. In negative terms, we ask what existential
risks humanity might face from AI development in the next century, and by what
principles contemporary technical research might be directed to address those
risks.
A key property of hypothetical AI technologies is introduced, called
\emph{prepotence}, which is useful for delineating a variety of potential
existential risks from artificial intelligence, even as AI paradigms might
shift. A set of \auxref{dirtot} contemporary research \directions are then
examined for their potential benefit to existential safety. Each research
direction is explained with a scenario-driven motivation, and examples of
existing work from which to build. The research directions present their own
risks and benefits to society that could occur at various scales of impact, and
in particular are not guaranteed to benefit existential safety if major
developments in them are deployed without adequate forethought and oversight.
As such, each direction is accompanied by a consideration of potentially
negative side effects