41,243 research outputs found

    Policy search via the signed derivative

    Full text link
    Abstract — We consider policy search for reinforcement learning: learning policy parameters, for some fixed policy class, that optimize performance of a system. In this paper, we propose a novel policy gradient method based on an approximation we call the Signed Derivative; the approximation is based on the intuition that it is often very easy to guess the direction in which control inputs affect future state variables, even if we do not have an accurate model of the system. The resulting algorithm is very simple, requires no model of the environment, and we show that it can outperform standard stochastic estimators of the gradient; indeed we show that Signed Derivative algorithm can in fact perform as well as the true (model-based) policy gradient, but without knowledge of the model. We evaluate the algorithm’s performance on both a simulated task and two realworld tasks — driving an RC car along a specified trajectory, and jumping onto obstacles with an quadruped robot — and in all cases achieve good performance after very little training. I

    “Peace for Our Time”: Past and Present Receptions of Neville Chamberlain’s Speech and the Munich Agreement

    Get PDF
    This paper covers British Prime Minister Neville Chamberlain\u27s role in the Munich Agreement, as well as his September 30th speech in London, and explains how Chamberlain\u27s attempt to negotiate peace with Hitler was received by the public. This paper examines three major newspapers: The London Times, The Manchester Guardian, and The New York Times, to see whether the press interpreted Chamberlain\u27s negotiation with Hitler as a success or a failure. The paper also builds off of the newspapers\u27 coverage to explain how Chamberlain and his policy of appeasement have been perceived through present-day

    Sick Pay, Health and Work

    Get PDF
    The purpose of this paper is to analyze the effects of different sickness insurance regimes on the employee decision reporting sick or not. We can think of the design problem as a representative employer’s decision to determine the optimal relationship between the wage and the sickness pay. The employee bases her decision to work or not on this relative price and her exogenously given health status that varies between individuals. We believe that the incentives present in the model are able to tell as about relevant aspects of the incentives involved in a state managed sickness insurance system. We calculate how the control variables depend on parameters such as the average productivity of the worker, the average productivity of the substitute, the wage of the substitute, and the search cost to find a substitute. Since we assume that the health status of the work force is heterogeneous and represented by a distribution function, we are also able to calculate the change in the work participation rate, as a function of the parameters.Sickness insurance design; wage setting; and labour force participation

    Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus

    Full text link
    In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system's surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.Comment: KDD '18 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Pages 1254-1262, 9 pages, 5 figures, conference submission, University of Oxford. arXiv admin note: text overlap with arXiv:1709.0235

    Traditional Knowledge and Biodiversity in South Africa : CSIR case

    Get PDF
    The focus of this paper is traditional knowledge (TK) and indigenous biological resources protection in South Africa, through the analysis of the existing policies and legislations, in order to provide a useful insight for a developed country such as Japan which has recently adopted the guidelines for the protection of TK and biological resources and promotion of access and benefit sharing (ABS). South Africa is the 3rd most diverse country in terms of natural resources, culture and traditions, languages and geology and its comprehensive legislative framework system shows the country\u27s seriousness to safeguard TK and conserve biological resources for future generations. The paper uses the South Africa\u27s government owned research and technology development institution, Council for Scientific and Industrial Research (CSIR), as an example to demonstrate the application of the TK protection and biodiversity conservation (including access and benefit sharing) laws, through case studies approach for lessons learned for other African countries, contemplating creation of their own TK protection and environmental conservation. Due to the repositioning of CSIR within the local and global research and develop, the organisation has adopted Industrialisation Strategy, and TK will play a significant role in technology development and new business models in rural agroprocessing and production to enhance inclusive development (through benefit sharing) and support economic growth. The paper concludes that TK and indigenous biological resources protection through the relevant government laws, as well as value addition to TK and biodiversity through research and development supported by government funding, is necessary for socioeconomic attainment, especially for local and indigenous communities and rural agroprocessing businesses as part of benefit sharing

    Grid infrastructures for secure access to and use of bioinformatics data: experiences from the BRIDGES project

    Get PDF
    The BRIDGES project was funded by the UK Department of Trade and Industry (DTI) to address the needs of cardiovascular research scientists investigating the genetic causes of hypertension as part of the Wellcome Trust funded (£4.34M) cardiovascular functional genomics (CFG) project. Security was at the heart of the BRIDGES project and an advanced data and compute grid infrastructure incorporating latest grid authorisation technologies was developed and delivered to the scientists. We outline these grid infrastructures and describe the perceived security requirements at the project start including data classifications and how these evolved throughout the lifetime of the project. The uptake and adoption of the project results are also presented along with the challenges that must be overcome to support the secure exchange of life science data sets. We also present how we will use the BRIDGES experiences in future projects at the National e-Science Centre

    Message Passing Algorithms for Compressed Sensing

    Full text link
    Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications. Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization. We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models. Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.Comment: 6 pages paper + 9 pages supplementary information, 13 eps figure. Submitted to Proc. Natl. Acad. Sci. US
    corecore