Search CORE

417 research outputs found

Hindsight policy gradients

Author: Mutz F
Rauber P
Schmidhuber J
Ummadisingu A
Publication venue
Publication date: 01/01/2019
Field of study

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.Comment: Accepted to ICLR 201

arXiv.org e-Print Archive

Queen Mary Research Online

Reinforcement Learning in Sparse-Reward Environments with Hindsight Policy Gradients

Author: Mutz F
Rauber P
Schmidhuber J
Ummadisingu A
Publication venue: 'MIT Press - Journals'
Publication date: 01/05/2021
Field of study

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency

Queen Mary Research Online

A Model-Predictive Motion Planner for the IARA Autonomous Car

Author: Badue Claudine
Cardoso Vinicius
De Souza Alberto F.
Mutz Filipe
Oliveira Josias
Oliveira-Santos Thiago
Teixeira Thomas
Veronese Lucas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/11/2017
Field of study

We present the Model-Predictive Motion Planner (MPMP) of the Intelligent Autonomous Robotic Automobile (IARA). IARA is a fully autonomous car that uses a path planner to compute a path from its current position to the desired destination. Using this path, the current position, a goal in the path and a map, IARA's MPMP is able to compute smooth trajectories from its current position to the goal in less than 50 ms. MPMP computes the poses of these trajectories so that they follow the path closely and, at the same time, are at a safe distance of eventual obstacles. Our experiments have shown that MPMP is able to compute trajectories that precisely follow a path produced by a Human driver (distance of 0.15 m in average) while smoothly driving IARA at speeds of up to 32.4 km/h (9 m/s).Comment: This is a preprint. Accepted by 2017 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

Crossref

‘It would be okay if they came through the proper channels’: community perceptions and attitudes toward asylum seekers in Australia

Author: Boman
Dunn
Edwards
Edwards
Every
F. H. Mckay
Gale
Hoskin
Mutz
Pickering
S. Kneebone
S. L. Thomas
Simon
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Australia\u27s humanitarian programme contributes to UNHCR\u27s global resettlement programme and enhances Australia\u27s international humanitarian reputation. However, as the recent tragedy on Christmas Island has shown, the arrival of asylum seekers by boat continues to stimulate debate, discussion and reaction from the Australian public and the Australian media. In this study, we used a mixed methods community survey to understand community perceptions and attitudes relating to asylum seekers. We found that while personal contact with asylum seekers was important when forming opinions about this group of immigrants, for the majority of respondents, attitudes and opinions towards asylum seekers were more influenced by the interplay between traditional Australian values and norms, the way that these norms appeared to be threatened by asylum seekers, and the way that these threats were reinforced both in media and political rhetoric

Deakin Research Online

Crossref

Research Online

Synthetic Data Generation and Defense in Depth Measurement of Web Applications

Author: A. Shiravi
C. Dwork
D. Mutz
F. Valeur
G.F. Cretu-Ciocarlie
H. Cavusoglu
K.L. Ingham
L. Sweeney
M. Tavallaee
N. Boggs
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls

Crossref

Columbia University Academic Commons

New Criticality of 1D Fermions

Author: A.A. Belavin
Al.B. Zamolodchikov
B. Duplantier
D.A. Kastor
F. David
G. Forgacs
H. Li
H.B. Thacker
J. Frohn
J. Villain
J.J. Rajasekaran
L. Balents
M. Kardar
M. Lässig
M. Lässig
M. Mutz
M.E. Fisher
M.E. Fisher
M.E. Fisher
M.P.M. den Nijs
Michael Lässig
R. Lipowsky
R. Netz
R. Netz
R. Netz
T.W. Burkhardt
Publication venue: 'American Physical Society (APS)'
Publication date: 08/07/1994
Field of study

One-dimensional massive quantum particles (or 1+1-dimensional random walks) with short-ranged multi-particle interactions are studied by exact renormalization group methods. With repulsive pair forces, such particles are known to scale as free fermions. With finite

m

-body forces (m = 3,4,...), a critical instability is found, indicating the transition to a fermionic bound state. These unbinding transitions represent new universality classes of interacting fermions relevant to polymer and membrane systems. Implications for massless fermions, e.g. in the Hubbard model, are also noted. (to appear in Phys. Rev. Lett.)Comment: 10 pages (latex), with 2 figures (not included

arXiv.org e-Print Archive

Crossref

Fluctuations and differential contraction during regeneration of Hydra vulgaris tissue toroids

Author: Belytschko T
Claus Fütterer
Cook R D
Fütterer C
Holtfreter J
Iris Wenzel
Joseph Goldmann
Julia Fischer
Kao-Nung Lin
Kosevich I A
Koth S
Markus Kästner
McDowall A W
Michael Krahe
Mutz E
Orescanin M Qayyum M A Toohey K S Insana M F
Wang N
Wetzel G
Whitehead J
Wolff L
Zhao R
Publication venue: 'IOP Publishing'
Publication date: 30/11/2012
Field of study

We studied regenerating bilayered tissue toroids dissected from Hydra vulgaris polyps and relate our macroscopic observations to the dynamics of force-generating mesoscopic cytoskeletal structures. Tissue fragments undergo a specific toroid-spheroid folding process leading to complete regeneration towards a new organism. The time scale of folding is too fast for biochemical signalling or morphogenetic gradients which forced us to assume purely mechanical self-organization. The initial pattern selection dynamics was studied by embedding toroids into hydro-gels allowing us to observe the deformation modes over longer periods of time. We found increasing mechanical fluctuations which break the toroidal symmetry and discuss the evolution of their power spectra for various gel stiffnesses. Our observations are related to single cell studies which explain the mechanical feasibility of the folding process. In addition, we observed switching of cells from a tissue bound to a migrating state after folding failure as well as in tissue injury. We found a supra-cellular actin ring assembled along the toroid's inner edge. Its contraction can lead to the observed folding dynamics as we could confirm by finite element simulations. This actin ring in the inner cell layer is assembled by myosin- driven length fluctuations of supra-cellular {\alpha}-actin structures (myonemes) in the outer cell-layer.Comment: 19 pages and 8 figures, submitted to New Journal of Physic

arXiv.org e-Print Archive

Crossref

“Of Gods and Men” : selected print media coverage of natural disasters and industrial failures in three Westminster countries

Author: 20100807034701
20100807034701
Alaszewski
Alaszewski
Atwood
Atwood
Bakir
Bakir
Barnes
Barnes
Baron
Baron
Boholm
Boholm
Brun
Brun
Canada
Canada
Conver
Conver
Cottle
Cottle
December
December
Drottz
Drottz
Fischhoff
Fischhoff
Fischhoff
Fischhoff
Frewer
Frewer
Gaskell
Gaskell
Hood
Hood
Hughes
Hughes
John Quigley
Johnson
Johnson
Johnson
Johnson
Kasperson
Kasperson
Kevin F. Quigley
Kitzinger
Kitzinger
Kraus
Kraus
Leahy
Leahy
McInerney
McInerney
McLeod
McLeod
Moeller
Moeller
Mutz
Mutz
Pidgeon
Pidgeon
Quigley
Quigley
Quigley
Quigley
Quigley
Quigley
Rowe
Rowe
Rundmo
Rundmo
Rundmo
Rundmo
Sjoberg
Sjoberg
Slovic
Slovic
Slovic
Slovic
Soumerai
Soumerai
Steinberg
Steinberg
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2013
Field of study

This article examines selected print media coverage of a domestic natural disaster and domestic industrial failure in each of three Westminster countries: Australia, Canada, and the UK. It studies this coverage from several perspectives: the volume of coverage; the rate at which the articles were published; the tone of the headlines; and a content analysis of the perceived performance of key public and private institutions during and following the events. Its initial findings reveal that the natural disasters received more coverage than the industrial failures in each of the newspapers considered. There was also no significant difference in the publication rate across event type or newspaper. In each case, government was assessed at least as frequently and negatively as non-government actors, particularly during and following industrial failures. The manner in which government and non-government actors were assessed following these events suggests that, contrary to government claims that owners and operators of critical infrastructure (CI) are responsible for its successful operation, government in fact is “in the frame” as frequently as the industry owners and operators are. In addition, the negative assessments of governments following industrial failures in particular may prompt over-reaction by policy makers to industrial failures and under-reaction to natural disasters. This inconsistency is indeed ironic because the latter occur more often and cost more, both financially and socially. We reviewed 340 newspaper articles from three different newspapers: The Australian’s coverage of the Canberra bushfires and the Waterfall train accident, The Globe and Mail’s (Canada) coverage of Hurricane Juan and the de la Concorde overpass collapse, and The Daily Telegraph’s (UK) coverage of the 2007 floods and the Potters Bar train wreck. Our sample size is small; our ability to compare across newspapers and countries limited. Further research is warranted

Crossref

University of Strathclyde Institutional Repository