41,243 research outputs found
Policy search via the signed derivative
Abstract — We consider policy search for reinforcement learning: learning policy parameters, for some fixed policy class, that optimize performance of a system. In this paper, we propose a novel policy gradient method based on an approximation we call the Signed Derivative; the approximation is based on the intuition that it is often very easy to guess the direction in which control inputs affect future state variables, even if we do not have an accurate model of the system. The resulting algorithm is very simple, requires no model of the environment, and we show that it can outperform standard stochastic estimators of the gradient; indeed we show that Signed Derivative algorithm can in fact perform as well as the true (model-based) policy gradient, but without knowledge of the model. We evaluate the algorithm’s performance on both a simulated task and two realworld tasks — driving an RC car along a specified trajectory, and jumping onto obstacles with an quadruped robot — and in all cases achieve good performance after very little training. I
“Peace for Our Time”: Past and Present Receptions of Neville Chamberlain’s Speech and the Munich Agreement
This paper covers British Prime Minister Neville Chamberlain\u27s role in the Munich Agreement, as well as his September 30th speech in London, and explains how Chamberlain\u27s attempt to negotiate peace with Hitler was received by the public. This paper examines three major newspapers: The London Times, The Manchester Guardian, and The New York Times, to see whether the press interpreted Chamberlain\u27s negotiation with Hitler as a success or a failure. The paper also builds off of the newspapers\u27 coverage to explain how Chamberlain and his policy of appeasement have been perceived through present-day
Sick Pay, Health and Work
The purpose of this paper is to analyze the effects of different sickness insurance regimes on the employee decision reporting sick or not. We can think of the design problem as a representative employer’s decision to determine the optimal relationship between the wage and the sickness pay. The employee bases her decision to work or not on this relative price and her exogenously given health status that varies between individuals. We believe that the incentives present in the model are able to tell as about relevant aspects of the incentives involved in a state managed sickness insurance system. We calculate how the control variables depend on parameters such as the average productivity of the worker, the average productivity of the substitute, the wage of the substitute, and the search cost to find a substitute. Since we assume that the health status of the work force is heterogeneous and represented by a distribution function, we are also able to calculate the change in the work participation rate, as a function of the parameters.Sickness insurance design; wage setting; and labour force participation
Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus
In systems of multiple agents, identifying the cause of observed agent
dynamics is challenging. Often, these agents operate in diverse, non-stationary
environments, where models rely on hand-crafted environment-specific features
to infer influential regions in the system's surroundings. To overcome the
limitations of these inflexible models, we present GP-LAPLACE, a technique for
locating sources and sinks from trajectories in time-varying fields. Using
Gaussian processes, we jointly infer a spatio-temporal vector field, as well as
canonical vector calculus operations on that field. Notably, we do this from
only agent trajectories without requiring knowledge of the environment, and
also obtain a metric for denoting the significance of inferred causal features
in the environment by exploiting our probabilistic method. To evaluate our
approach, we apply it to both synthetic and real-world GPS data, demonstrating
the applicability of our technique in the presence of multiple agents, as well
as its superiority over existing methods.Comment: KDD '18 Proceedings of the 24th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining, Pages 1254-1262, 9 pages, 5 figures,
conference submission, University of Oxford. arXiv admin note: text overlap
with arXiv:1709.0235
Traditional Knowledge and Biodiversity in South Africa : CSIR case
The focus of this paper is traditional knowledge (TK) and indigenous biological resources protection in South Africa, through the analysis of the existing policies and legislations, in order to provide a useful insight for a developed country such as Japan which has recently adopted the guidelines for the protection of TK and biological resources and promotion of access and benefit sharing (ABS). South Africa is the 3rd most diverse country in terms of natural resources, culture and traditions, languages and geology and its comprehensive legislative framework system shows the country\u27s seriousness to safeguard TK and conserve biological resources for future generations. The paper uses the South Africa\u27s government owned research and technology development institution, Council for Scientific and Industrial Research (CSIR), as an example to demonstrate the application of the TK protection and biodiversity conservation (including access and benefit sharing) laws, through case studies approach for lessons learned for other African countries, contemplating creation of their own TK protection and environmental conservation. Due to the repositioning of CSIR within the local and global research and develop, the organisation has adopted Industrialisation Strategy, and TK will play a significant role in technology development and new business models in rural agroprocessing and production to enhance inclusive development (through benefit sharing) and support economic growth. The paper concludes that TK and indigenous biological resources protection through the relevant government laws, as well as value addition to TK and biodiversity through research and development supported by government funding, is necessary for socioeconomic attainment, especially for local and indigenous communities and rural agroprocessing businesses as part of benefit sharing
Grid infrastructures for secure access to and use of bioinformatics data: experiences from the BRIDGES project
The BRIDGES project was funded by the UK Department of Trade and Industry (DTI) to address the needs of cardiovascular research scientists investigating the genetic causes of hypertension as part of the Wellcome Trust funded (£4.34M) cardiovascular functional genomics (CFG) project. Security was at the heart of the BRIDGES project and an advanced data and compute grid infrastructure incorporating latest grid authorisation technologies was developed and delivered to the scientists. We outline these grid infrastructures and describe the perceived security requirements at the project start including data classifications and how these evolved throughout the lifetime of the project. The uptake and adoption of the project results are also presented along with the challenges that must be overcome to support the secure exchange of life science data sets. We also present how we will use the BRIDGES experiences in future projects at the National e-Science Centre
Message Passing Algorithms for Compressed Sensing
Compressed sensing aims to undersample certain high-dimensional signals, yet
accurately reconstruct them by exploiting signal characteristics. Accurate
reconstruction is possible when the object to be recovered is sufficiently
sparse in a known basis. Currently, the best known sparsity-undersampling
tradeoff is achieved when reconstructing by convex optimization -- which is
expensive in important large-scale applications. Fast iterative thresholding
algorithms have been intensively studied as alternatives to convex optimization
for large-scale problems. Unfortunately known fast algorithms offer
substantially worse sparsity-undersampling tradeoffs than convex optimization.
We introduce a simple costless modification to iterative thresholding making
the sparsity-undersampling tradeoff of the new algorithms equivalent to that of
the corresponding convex optimization procedures. The new
iterative-thresholding algorithms are inspired by belief propagation in
graphical models. Our empirical measurements of the sparsity-undersampling
tradeoff for the new algorithms agree with theoretical calculations. We show
that a state evolution formalism correctly derives the true
sparsity-undersampling tradeoff. There is a surprising agreement between
earlier calculations based on random convex polytopes and this new, apparently
very different theoretical formalism.Comment: 6 pages paper + 9 pages supplementary information, 13 eps figure.
Submitted to Proc. Natl. Acad. Sci. US
- …