417 research outputs found

    Hindsight policy gradients

    Get PDF
    A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.Comment: Accepted to ICLR 201

    Reinforcement Learning in Sparse-Reward Environments with Hindsight Policy Gradients

    Get PDF
    A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency

    A Model-Predictive Motion Planner for the IARA Autonomous Car

    Full text link
    We present the Model-Predictive Motion Planner (MPMP) of the Intelligent Autonomous Robotic Automobile (IARA). IARA is a fully autonomous car that uses a path planner to compute a path from its current position to the desired destination. Using this path, the current position, a goal in the path and a map, IARA's MPMP is able to compute smooth trajectories from its current position to the goal in less than 50 ms. MPMP computes the poses of these trajectories so that they follow the path closely and, at the same time, are at a safe distance of eventual obstacles. Our experiments have shown that MPMP is able to compute trajectories that precisely follow a path produced by a Human driver (distance of 0.15 m in average) while smoothly driving IARA at speeds of up to 32.4 km/h (9 m/s).Comment: This is a preprint. Accepted by 2017 IEEE International Conference on Robotics and Automation (ICRA

    ‘It would be okay if they came through the proper channels’: community perceptions and attitudes toward asylum seekers in Australia

    Full text link
    Australia\u27s humanitarian programme contributes to UNHCR\u27s global resettlement programme and enhances Australia\u27s international humanitarian reputation. However, as the recent tragedy on Christmas Island has shown, the arrival of asylum seekers by boat continues to stimulate debate, discussion and reaction from the Australian public and the Australian media. In this study, we used a mixed methods community survey to understand community perceptions and attitudes relating to asylum seekers. We found that while personal contact with asylum seekers was important when forming opinions about this group of immigrants, for the majority of respondents, attitudes and opinions towards asylum seekers were more influenced by the interplay between traditional Australian values and norms, the way that these norms appeared to be threatened by asylum seekers, and the way that these threats were reinforced both in media and political rhetoric

    Synthetic Data Generation and Defense in Depth Measurement of Web Applications

    Get PDF
    Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls

    New Criticality of 1D Fermions

    Full text link
    One-dimensional massive quantum particles (or 1+1-dimensional random walks) with short-ranged multi-particle interactions are studied by exact renormalization group methods. With repulsive pair forces, such particles are known to scale as free fermions. With finite mm-body forces (m = 3,4,...), a critical instability is found, indicating the transition to a fermionic bound state. These unbinding transitions represent new universality classes of interacting fermions relevant to polymer and membrane systems. Implications for massless fermions, e.g. in the Hubbard model, are also noted. (to appear in Phys. Rev. Lett.)Comment: 10 pages (latex), with 2 figures (not included

    Fluctuations and differential contraction during regeneration of Hydra vulgaris tissue toroids

    Full text link
    We studied regenerating bilayered tissue toroids dissected from Hydra vulgaris polyps and relate our macroscopic observations to the dynamics of force-generating mesoscopic cytoskeletal structures. Tissue fragments undergo a specific toroid-spheroid folding process leading to complete regeneration towards a new organism. The time scale of folding is too fast for biochemical signalling or morphogenetic gradients which forced us to assume purely mechanical self-organization. The initial pattern selection dynamics was studied by embedding toroids into hydro-gels allowing us to observe the deformation modes over longer periods of time. We found increasing mechanical fluctuations which break the toroidal symmetry and discuss the evolution of their power spectra for various gel stiffnesses. Our observations are related to single cell studies which explain the mechanical feasibility of the folding process. In addition, we observed switching of cells from a tissue bound to a migrating state after folding failure as well as in tissue injury. We found a supra-cellular actin ring assembled along the toroid's inner edge. Its contraction can lead to the observed folding dynamics as we could confirm by finite element simulations. This actin ring in the inner cell layer is assembled by myosin- driven length fluctuations of supra-cellular {\alpha}-actin structures (myonemes) in the outer cell-layer.Comment: 19 pages and 8 figures, submitted to New Journal of Physic

    “Of Gods and Men” : selected print media coverage of natural disasters and industrial failures in three Westminster countries

    Get PDF
    This article examines selected print media coverage of a domestic natural disaster and domestic industrial failure in each of three Westminster countries: Australia, Canada, and the UK. It studies this coverage from several perspectives: the volume of coverage; the rate at which the articles were published; the tone of the headlines; and a content analysis of the perceived performance of key public and private institutions during and following the events. Its initial findings reveal that the natural disasters received more coverage than the industrial failures in each of the newspapers considered. There was also no significant difference in the publication rate across event type or newspaper. In each case, government was assessed at least as frequently and negatively as non-government actors, particularly during and following industrial failures. The manner in which government and non-government actors were assessed following these events suggests that, contrary to government claims that owners and operators of critical infrastructure (CI) are responsible for its successful operation, government in fact is “in the frame” as frequently as the industry owners and operators are. In addition, the negative assessments of governments following industrial failures in particular may prompt over-reaction by policy makers to industrial failures and under-reaction to natural disasters. This inconsistency is indeed ironic because the latter occur more often and cost more, both financially and socially. We reviewed 340 newspaper articles from three different newspapers: The Australian’s coverage of the Canberra bushfires and the Waterfall train accident, The Globe and Mail’s (Canada) coverage of Hurricane Juan and the de la Concorde overpass collapse, and The Daily Telegraph’s (UK) coverage of the 2007 floods and the Potters Bar train wreck. Our sample size is small; our ability to compare across newspapers and countries limited. Further research is warranted
    • 

    corecore