107 research outputs found

    Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence

    Full text link
    An important challenge for safety in machine learning and artificial intelligence systems is a~set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart's or Campbell's law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes.Comment: 12 Pages, This version re-submitted to Big Data and Cognitive Computing, Special Issue "Artificial Superintelligence: Coordination & Strategy

    The Fragile World Hypothesis: Complexity, Fragility, and Systemic Existential Risk

    Get PDF
    The possibility of social and technological collapse has been the focus of science fiction tropes for decades, but more recent focus has been on specific sources of existential and global catastrophic risk. Because these scenarios are simple to understand and envision, they receive more attention than risks due to complex interplay of failures, or risks that cannot be clearly specified. In this paper, we discuss the possibility that complexity of a certain type leads to fragility which can function as a source of catastrophic or even existential risk. The paper first reviews a hypothesis by Bostrom about inevitable technological risks, named the vulnerable world hypothesis. This paper next hypothesizes that fragility may not only be a possible risk, but could be inevitable,and would therefore be a subclass or example of Bostrom’s vulnerable worlds. After introducing the titular fragile world hypothesis, the paper details the conditions under which it would be correct, and presents arguments for why the conditions may in fact may apply. Finally, the assumptions and potential mitigations of the new hypothesis are contrasted with those Bostrom suggests

    Building Less Flawed Metrics: Dodging Goodhart and Campbell's Laws

    Get PDF
    Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by first Campbell, then Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. The discussion will then address important desiderata and the trade-offs involved in each approach, and examples of how they differ, and how the issues can be addressed. Finally, the paper outlines a process for metric design for practitioners who need to design metrics, and as a basis for further elaboration in specific domains

    Building less-flawed metrics: Understanding and creating better measurement and incentive systems

    Get PDF
    Metrics are useful for measuring systems and motivating behaviors in academia as well as in public policy, medicine, business, and other systems. Unfortunately, naive application of metrics to a system can distort the system and even undermine the original goal. There are two interrelated problems to overcome in building better metrics in academia and elsewhere. The first, specifying evaluable metrics that correspond to the goals, is well recognized but still often ignored. The second, minimizing perverse effects that undermine the metric or that enable people to game the rewards, is less recognized but is critical. This perspective discusses designing metrics, beginning with design considerations and processes; the presentation of specific strategies for mitigating perverse impacts, including secrecy, randomization, diversification, and post hoc specification; and continuing with important desiderata and tradeoffs involved with examples of how they can complement each other or differ. Finally, this perspective presents a comprehensive process integrating these ideas

    Introduction

    Get PDF

    Building Less Flawed Metrics

    Get PDF
    Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell and Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach

    Testing, tracing and isolation in compartmental models

    Get PDF
    Existing compartmental mathematical modelling methods for epidemics, such as SEIR models, cannot accurately represent effects of contact tracing. This makes them inappropriate for evaluating testing and contact tracing strategies to contain an outbreak. An alternative used in practice is the application of agent- or individual-based models (ABM). However ABMs are complex, less well-understood and much more computationally expensive. This paper presents a new method for accurately including the effects of Testing, contact-Tracing and Isolation (TTI) strategies in standard compartmental models. We derive our method using a careful probabilistic argument to show how contact tracing at the individual level is reflected in aggregate on the population level. We show that the resultant SEIR-TTI model accurately approximates the behaviour of a mechanistic agent-based model at far less computational cost. The computational efficiency is such that it can be easily and cheaply used for exploratory modelling to quantify the required levels of testing and tracing, alone and with other interventions, to assist adaptive planning for managing disease outbreaks

    What is the upper limit of value?

    Get PDF
    How much value can our decisions create? We argue that unless our current understanding of physics is wrong in fairly fundamental ways, there exists an upper limit of value relevant to our decisions. First, due to the speed of light and the definition and conception of economic growth, the limit to economic growth is a restrictive one. Additionally, a related far larger but still finite limit exists for value in a much broader sense due to the physics of information and the ability of physical beings to place value on outcomes. We discuss how this argument can handle lexicographic preferences, probabilities, and the implications for infinite ethics and ethical uncertainty
    • …
    corecore