7,170 research outputs found

    Probabilistic Guarantees for Safe Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning has been successfully applied to many control tasks, but the application of such agents in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning agents in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on agents trained for several benchmark control problems

    From flowers to palms: 40 years of policy for online learning

    Get PDF
    This year sees the 40th anniversary of the first policy paper regarding the use of computers in higher education in the United Kingdom. The publication of this paper represented the beginning of the field of learning technology research and practice in higher education. In the past 40 years, policy has at various points drawn from different communities and provided the roots for a diverse field of learning technology researchers and practitioners. This paper presents a review of learning technology-related policy over the past 40 years. The purpose of the review is to make sense of the current position in which the field finds itself, and to highlight lessons that can be learned from the implementation of previous policies. Conclusions drawn from the review of 40 years of learning technology policy suggest that there are few challenges that have not been faced before as well as a potential return to individual innovation

    The Effects of Social Power Bases Within Varying Organizational Cultures

    Full text link
    This study focuses on social power in the context of organizational culture and how this relationship impacts outcomes of follower compliance and trust. Power is the ability to direct or influence the behavior of others or a course of events (Handgraaf, et al., 2008). There are six different types of social power, including informational, referent, legitimate, coercive, rewarding, and expert (Fontaine & Beerman, 1977). Each type of social power may lead to varying psychological outcomes, such as compliance, satisfaction, and agreement. To date, the empirical literature has not fully addressed the issue of whether one type of power is more effective than the others in different organizational cultural contexts. This study examined the effectiveness of four types of social power in varying organizational cultural contexts for eliciting follower compliance and trust (Tharp, 2009). The methodology employed videos which manipulated the types of power and culture to examine their impact on followers. Followers were asked to what extent they would comply with the leader and how much they trusted the leader. None of the findings for MANOVA, ANOVA and T-tests were statistically significant. Coercive power in hierarchical culture demonstrated higher compliance and trust outcome means, but reward power within an adhocracy culture demonstrated lower compliance and trust outcome means. Results are discussed in terms of potential confounders, possible attributional influences, and the implications for organizational outcomes of compliance and trust

    Boundedly Rational Decision Emergence - A General Perspective and Some Selective Illustrations

    Get PDF
    A general framework is described specifying how boundedly rational decision makers generate their choices. Starting from a "Master Module" which keeps an inventory of previously successful and unsuccessful routines several submodules can be called forth which either allow one to adjust behavior (by "Learning Module" and "Adaptation Procedure") or to generate new decision routines (by applying "New Problem Solver"). Our admittedly bold attempt is loosely related to some stylized experimental results.
    • …
    corecore