3 research outputs found
Negative Human Rights as a Basis for Long-term AI Safety and Regulation
If autonomous AI systems are to be reliably safe in novel situations, they
will need to incorporate general principles guiding them to recognize and avoid
harmful behaviours. Such principles may need to be supported by a binding
system of regulation, which would need the underlying principles to be widely
accepted. They should also be specific enough for technical implementation.
Drawing inspiration from law, this article explains how negative human rights
could fulfil the role of such principles and serve as a foundation both for an
international regulatory system and for building technical safety constraints
for future AI systems
Pessimistic Bayesianism for conservative optimization and imitation
Subject to several assumptions, sufficiently advanced reinforcement learners would likely face an incentive and likely have an ability to intervene in the provision of their reward, with catastrophic consequences. In this thesis, I develop a theory of pessimism and show how it can produce safe advanced artificial agents. Not only do I demonstrate that the assumptions mentioned above can be avoided; I prove theorems which demonstrate safety. First, I develop an idealized pessimistic reinforcement learner. For any given novel event that a mentor would never cause, a sufficiently pessimistic reinforcement learner trained with the help of that mentor would probably avoid causing it. This result is without precedent in the literature. Next, on similar principles, I develop an idealized pessimistic imitation learner. If the probability of an event when the demonstrator acts can be bounded above, then the probability can be bounded above when the imitator acts instead; this kind of result is unprecedented when the imitator learns online and the environment never resets. In an environment that never resets, no one has previously demonstrated, to my knowledge, that an imitation learner even exists. Finally, both of the agents above demand more efficient algorithms for high-quality uncertainty quantification, so I have developed a new kernel for Gaussian process modelling that allows for log-linear time complexity and linear space complexity, instead of a naïve cubic time complexity and quadratic space complexity. This is not the first Gaussian process with this time complexity—inducing points methods have linear complexity—but we do outperform such methods significantly on regression benchmarks, as one might expect given the much higher dimensionality of our kernel. This thesis shows the viability of pessimism with respect to well-quantified epistemic uncertainty as a path to safe artificial agency