3,530 research outputs found
Simultaneous Perturbation Algorithms for Batch Off-Policy Search
We propose novel policy search algorithms in the context of off-policy, batch
mode reinforcement learning (RL) with continuous state and action spaces. Given
a batch collection of trajectories, we perform off-line policy evaluation using
an algorithm similar to that by [Fonteneau et al., 2010]. Using this
Monte-Carlo like policy evaluator, we perform policy search in a class of
parameterized policies. We propose both first order policy gradient and second
order policy Newton algorithms. All our algorithms incorporate simultaneous
perturbation estimates for the gradient as well as the Hessian of the
cost-to-go vector, since the latter is unknown and only biased estimates are
available. We demonstrate their practicality on a simple 1-dimensional
continuous state space problem
ES Is More Than Just a Traditional Finite-Difference Approximator
An evolution strategy (ES) variant based on a simplification of a natural
evolution strategy recently attracted attention because it performs
surprisingly well in challenging deep reinforcement learning domains. It
searches for neural network parameters by generating perturbations to the
current set of parameters, checking their performance, and moving in the
aggregate direction of higher reward. Because it resembles a traditional
finite-difference approximation of the reward gradient, it can naturally be
confused with one. However, this ES optimizes for a different gradient than
just reward: It optimizes for the average reward of the entire population,
thereby seeking parameters that are robust to perturbation. This difference can
channel ES into distinct areas of the search space relative to gradient
descent, and also consequently to networks with distinct properties. This
unique robustness-seeking property, and its consequences for optimization, are
demonstrated in several domains. They include humanoid locomotion, where
networks from policy gradient-based reinforcement learning are significantly
less robust to parameter perturbation than ES-based policies solving the same
task. While the implications of such robustness and robustness-seeking remain
open to further study, this work's main contribution is to highlight such
differences and their potential importance
Metascheduling of HPC Jobs in Day-Ahead Electricity Markets
High performance grid computing is a key enabler of large scale collaborative
computational science. With the promise of exascale computing, high performance
grid systems are expected to incur electricity bills that grow super-linearly
over time. In order to achieve cost effectiveness in these systems, it is
essential for the scheduling algorithms to exploit electricity price
variations, both in space and time, that are prevalent in the dynamic
electricity price markets. In this paper, we present a metascheduling algorithm
to optimize the placement of jobs in a compute grid which consumes electricity
from the day-ahead wholesale market. We formulate the scheduling problem as a
Minimum Cost Maximum Flow problem and leverage queue waiting time and
electricity price predictions to accurately estimate the cost of job execution
at a system. Using trace based simulation with real and synthetic workload
traces, and real electricity price data sets, we demonstrate our approach on
two currently operational grids, XSEDE and NorduGrid. Our experimental setup
collectively constitute more than 433K processors spread across 58 compute
systems in 17 geographically distributed locations. Experiments show that our
approach simultaneously optimizes the total electricity cost and the average
response time of the grid, without being unfair to users of the local batch
systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System
A robust sustainable optimization & control strategy (RSOCS) for (fed-)batch processes towards the low-cost reduction of utilities consumption
YesThe need for the development of clean but still profitable processes and the study of low environmental impact and economically convenient management policies for them are two challenges for the years to come. This paper tries to give a first answer to the second of these needs, limited to the area of discontinuous productions. It deals with the development of a robust methodology for the profitable and clean management of (fed-)batch units under uncertainty, which can be referred to as a robust sustainability-oriented model-based optimization & control strategy. This procedure is specifically designed to ensure elevated process performances along with low-cost utilities usage reduction in real-time, simultaneously allowing for the effect of any external perturbation. In this way, conventional offline methods for process sustainable optimization can be easily overcome since the most suitable management policy, aimed at process sustainability, can be dynamically determined and applied in any operating condition. This leads to a significant step forward with respect to the nowadays options in terms of sustainable process management, that drives towards a cleaner and more energy-efficient future. The proposed theoretical framework is validated and tested on a case study based on the well-known fed-batch version of the Williams-Otto process to demonstrate its tangible benefits. The results achieved in this case study are promising and show that the framework is very effective in case of typical process operation while it is partially effective in case of unusual/unlikely critical process disturbances. Future works will go towards the removal of this weakness and further improvement in the algorithm robustness
Adaptive traffic signal control using approximate dynamic programming
This paper presents a study on an adaptive traffic signal controller for real-time operation. The controller aims for three operational objectives: dynamic allocation of green time, automatic adjustment to control parameters, and fast revision of signal plans. The control algorithm is built on approximate dynamic programming (ADP). This approach substantially reduces computational burden by using an approximation to the value function of the dynamic programming and reinforcement learning to update the approximation. We investigate temporal-difference learning and perturbation learning as specific learning techniques for the ADP approach. We find in computer simulation that the ADP controllers achieve substantial reduction in vehicle delays in comparison with optimised fixed-time plans. Our results show that substantial benefits can be gained by increasing the frequency at which the signal plans are revised, which can be achieved conveniently using the ADP approach
- …