Search CORE

289 research outputs found

Addressing Function Approximation Error in Actor-Critic Methods

Author: Fujimoto Scott
Meger David
van Hoof Herke
Publication venue
Publication date: 01/01/2018
Field of study

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.Comment: Accepted at ICML 201

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Cost Adaptation for Robust Decentralized Swarm Behaviour

Author: Coates Mark
Henderson Peter
Meger David
Vertescher Matthew
Publication venue
Publication date: 29/09/2018
Field of study

Decentralized receding horizon control (D-RHC) provides a mechanism for coordination in multi-agent settings without a centralized command center. However, combining a set of different goals, costs, and constraints to form an efficient optimization objective for D-RHC can be difficult. To allay this problem, we use a meta-learning process -- cost adaptation -- which generates the optimization objective for D-RHC to solve based on a set of human-generated priors (cost and constraint functions) and an auxiliary heuristic. We use this adaptive D-RHC method for control of mesh-networked swarm agents. This formulation allows a wide range of tasks to be encoded and can account for network delays, heterogeneous capabilities, and increasingly large swarms through the adaptation mechanism. We leverage the Unity3D game engine to build a simulator capable of introducing artificial networking failures and delays in the swarm. Using the simulator we validate our method on an example coordinated exploration task. We demonstrate that cost adaptation allows for more efficient and safer task completion under varying environment conditions and increasingly large swarm sizes. We release our simulator and code to the community for future work.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 201

arXiv.org e-Print Archive

Crossref

Bayesian Policy Gradients via Alpha Divergence Dropout Inference

Author: Henderson Peter
Doan Thang
Islam Riashat
Meger David
Publication venue
Publication date: 05/12/2017
Field of study

Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult. We propose an approach which instead estimates a distribution by fitting the value function with a Bayesian Neural Network. We optimize an

\alpha

-divergence objective with Bayesian dropout approximation to learn and estimate this distribution. We show that using the Monte Carlo posterior mean of the Bayesian value function distribution, rather than a deterministic network, improves stability and performance of policy gradient methods in continuous control MuJoCo simulations.Comment: Accepted to Bayesian Deep Learning Workshop at NIPS 201

arXiv.org e-Print Archive

FigShare

Returning to the root : radical feminist thought and feminist theories of International Relations

Author: Duriesmith D.
Meger S.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/07/2020
Field of study

Feminist International Relations (IR) theory is haunted by a radical feminist ghost. From Enloe's suggestion that the personal is both political and international, often seen as the foundation of feminist IR, feminist IR scholarship has been built on the intellectual contributions of a body of theory it has long left for dead. Though Enloe's sentiment directly references the Hanisch's radical feminist rallying call, there is little direct engagement with the radical feminist thinkers who popularised the sentiment in IR. Rather, since its inception, the field has been built on radical feminist thought it has left for dead. This has left feminist IR troubled by its radical feminist roots and the conceptual baggage that feminist IR has unreflectively carried from second-wave feminism into its contemporary scholarship. By returning to the roots of radical feminism we believe IR can gain valuable insights regarding the system of sex-class oppression, the central role of heterosexuality in maintaining this system, and the feminist case for revolutionary political action in order to dismantle it

White Rose Research Online

University of Melbourne Institutional Repository

The Iterative Independent Model

Author: Meger Erin
Raz Abigail
Publication venue
Publication date: 02/09/2022
Field of study

Deterministic complex networks that use iterative generation algorithms have been found to more closely mirror properties found in real world networks than the traditional uniform random graph models. In this paper we introduce a new, Iterative Independent Model (IIM), generalizing previously defined models. These models use ideas from Structural Balance Theory to generate edges through a notion of cloning where ``the friend of my friend is my friend'' and anticloning where ``the enemy of my enemy is my friend''. In this paper, we vastly generalize these notions by allowing each vertex added at a given time step to choose independently of the other vertices if it will be cloned or anticloned. While it may seem natural to focus on a randomized model, where we randomly determine whether or not to clone any given vertex, we found the general deterministic model exhibited certain structural properties regardless of the probabilities. This allows applications to then explore the particulars, while having the theoretical model explain the structural phenomenons that occur in all possible scenarios. Throughout the paper we demonstrate that all IIM graphs have spectral gap bounded away from zero, which indicates the clustering properties also found in social networks. Furthermore, we show bounds on the diameter, domination number, and clique number further indicating the well clustered behaviour of IIM graphs. Finally, for any fixed graph

F

all IIM graphs will eventually contain an induced copy of

F

.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive