1,664 research outputs found

    A general theory for preferential sampling in environmental networks

    Full text link
    This paper presents a general model framework for detecting the preferential sampling of environmental monitors recording an environmental process across space and/or time. This is achieved by considering the joint distribution of an environmental process with a site--selection process that considers where and when sites are placed to measure the process. The environmental process may be spatial, temporal or spatio--temporal in nature. By sharing random effects between the two processes, the joint model is able to establish whether site placement was stochastically dependent of the environmental process under study. The embedding into a spatio--temporal framework also allows for the modelling of the dynamic site---selection process itself. Real--world factors affecting both the size and location of the network can be easily modelled and quantified. Depending upon the choice of population of locations to consider for selection across space and time under the site--selection process, different insights about the precise nature of preferential sampling can be obtained. The general framework developed in the paper is designed to be easily and quickly fit using the R-INLA package. We apply this framework to a case study involving particulate air pollution over the UK where a major reduction in the size of a monitoring network through time occurred. It is demonstrated that a significant response--biased reduction in the air quality monitoring network occurred. We also show that the network was consistently unrepresentative of the levels of particulate matter seen across much of GB throughout the operating life of the network. Finally we show that this may have led to a severe over-reporting of the population--average exposure levels experienced across GB. This could have great impacts on estimates of the health effects of black smoke levels.Comment: 33 pages of main text, 48 including the supplementary materia

    Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes

    Full text link
    Monte Carlo methods have become increasingly relevant for control of non-differentiable systems, approximate dynamics models and learning from data. These methods scale to high-dimensional spaces and are effective at the non-convex optimizations often seen in robot learning. We look at sample-based methods from the perspective of inference-based control, specifically posterior policy iteration. From this perspective, we highlight how Gaussian noise priors produce rough control actions that are unsuitable for physical robot deployment. Considering smoother Gaussian process priors, as used in episodic reinforcement learning and motion planning, we demonstrate how smoother model predictive control can be achieved using online sequential inference. This inference is realized through an efficient factorization of the action distribution and a novel means of optimizing the likelihood temperature to improve importance sampling accuracy. We evaluate this approach on several high-dimensional robot control tasks, matching the sample efficiency of prior heuristic methods while also ensuring smoothness. Simulation results can be seen at https://monte-carlo-ppi.github.io/.Comment: 43 pages, 37 figures. Conference on Robot Learning 202

    Coherent Soft Imitation Learning

    Full text link
    Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are not common, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behaviour-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilities fine-tuning cloned policies using the reward estimate and additional interactions with the environment. This approach conveniently achieves imitation learning through initial behaviour cloning, followed by refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor
    • …
    corecore