3,530 research outputs found

    Simultaneous Perturbation Algorithms for Batch Off-Policy Search

    Full text link
    We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off-line policy evaluation using an algorithm similar to that by [Fonteneau et al., 2010]. Using this Monte-Carlo like policy evaluator, we perform policy search in a class of parameterized policies. We propose both first order policy gradient and second order policy Newton algorithms. All our algorithms incorporate simultaneous perturbation estimates for the gradient as well as the Hessian of the cost-to-go vector, since the latter is unknown and only biased estimates are available. We demonstrate their practicality on a simple 1-dimensional continuous state space problem

    ES Is More Than Just a Traditional Finite-Difference Approximator

    Full text link
    An evolution strategy (ES) variant based on a simplification of a natural evolution strategy recently attracted attention because it performs surprisingly well in challenging deep reinforcement learning domains. It searches for neural network parameters by generating perturbations to the current set of parameters, checking their performance, and moving in the aggregate direction of higher reward. Because it resembles a traditional finite-difference approximation of the reward gradient, it can naturally be confused with one. However, this ES optimizes for a different gradient than just reward: It optimizes for the average reward of the entire population, thereby seeking parameters that are robust to perturbation. This difference can channel ES into distinct areas of the search space relative to gradient descent, and also consequently to networks with distinct properties. This unique robustness-seeking property, and its consequences for optimization, are demonstrated in several domains. They include humanoid locomotion, where networks from policy gradient-based reinforcement learning are significantly less robust to parameter perturbation than ES-based policies solving the same task. While the implications of such robustness and robustness-seeking remain open to further study, this work's main contribution is to highlight such differences and their potential importance

    Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

    Full text link
    High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System

    A robust sustainable optimization & control strategy (RSOCS) for (fed-)batch processes towards the low-cost reduction of utilities consumption

    Get PDF
    YesThe need for the development of clean but still profitable processes and the study of low environmental impact and economically convenient management policies for them are two challenges for the years to come. This paper tries to give a first answer to the second of these needs, limited to the area of discontinuous productions. It deals with the development of a robust methodology for the profitable and clean management of (fed-)batch units under uncertainty, which can be referred to as a robust sustainability-oriented model-based optimization & control strategy. This procedure is specifically designed to ensure elevated process performances along with low-cost utilities usage reduction in real-time, simultaneously allowing for the effect of any external perturbation. In this way, conventional offline methods for process sustainable optimization can be easily overcome since the most suitable management policy, aimed at process sustainability, can be dynamically determined and applied in any operating condition. This leads to a significant step forward with respect to the nowadays options in terms of sustainable process management, that drives towards a cleaner and more energy-efficient future. The proposed theoretical framework is validated and tested on a case study based on the well-known fed-batch version of the Williams-Otto process to demonstrate its tangible benefits. The results achieved in this case study are promising and show that the framework is very effective in case of typical process operation while it is partially effective in case of unusual/unlikely critical process disturbances. Future works will go towards the removal of this weakness and further improvement in the algorithm robustness

    Adaptive traffic signal control using approximate dynamic programming

    Get PDF
    This paper presents a study on an adaptive traffic signal controller for real-time operation. The controller aims for three operational objectives: dynamic allocation of green time, automatic adjustment to control parameters, and fast revision of signal plans. The control algorithm is built on approximate dynamic programming (ADP). This approach substantially reduces computational burden by using an approximation to the value function of the dynamic programming and reinforcement learning to update the approximation. We investigate temporal-difference learning and perturbation learning as specific learning techniques for the ADP approach. We find in computer simulation that the ADP controllers achieve substantial reduction in vehicle delays in comparison with optimised fixed-time plans. Our results show that substantial benefits can be gained by increasing the frequency at which the signal plans are revised, which can be achieved conveniently using the ADP approach
    • …
    corecore