13 research outputs found
Recommended from our members
Average-reward reinforcement learning for product delivery by multiple vehicles
Real time delivery of products is the context of stochastic demands and multiple vehicles is a difficult problem as it requires the joint investigation of the problems in inventory control and vehicle routing. We model this problem in the framework of Average reward Reinforcement Learning (ARL) and present experimental results on several ARL algorithms including a novel model-free algorithm called AR learning that automatically explores the state space while always choosing the greedy action with respect to the current approximate value function. Another contribution is a hybrid of linear and feature based function approximation method that yields superior performance to either method
Using a reinforcement learning approach in a discrete event manufacturing system
Ergänzung der gedruckten Ausgabe. Diese ist nur online verfügbar ist
Enhancement of Industrial Energy Efficiency and Sustainability
Industrial energy efficiency has been recognized as a major contributor, in the broader set of industrial resources, to improved sustainability and circular economy. Nevertheless, the uptake of energy efficiency measures and practices is still quite low, due to the existence of several barriers. Research has broadly discussed them, together with their drivers. More recently, many researchers have highlighted the existence of several benefits, beyond mere energy savings, stemming from the adoption of such measures, for several stakeholders involved in the value chain of energy efficiency solutions. Nevertheless, a deep understanding of the relationships between the use of the energy resource and other resources in industry, together with the most important factors for the uptake of such measures—also in light of the implications on the industrial operations—is still lacking. However, such understanding could further stimulate the adoption of solutions for improved industrial energy efficiency and sustainability
Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes
The book documents 25 papers collected from the Special Issue “Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes”, highlighting recent research trends in complex industrial processes. The book aims to stimulate the research field and be of benefit to readers from both academic institutes and industrial sectors
Hierarchical Average Reward Reinforcement Learning Hierarchical Average Reward Reinforcement Learning
Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior work has been largely limited to flat policy representations. In this paper, we develop a framework for HRL based on the average reward optimality criterion. We investigate two formulations of HRL based on the average reward SMDP model, both for discrete-time and continuous-time. These formulations correspond to two notions of optimality that have been previously explored in HRL: hierarchical optimality and recursive optimality. We present algorithms that learn to find hierarchically and recursively optimal average reward policies under discrete-time and continuous-time average rewar
Hierarchical Average Reward Reinforcement Learning
Hierarchical reinforcement learning (HRL) is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representational schemes based on SMDP models have been studied in previous work, all of which are based on the discrete-time discounted SMDP model. In this approach, policies are learned that maximize the long-term discounted sum of rewards. In this paper we investigate two formulations of HRL based on the average-reward SMDP model, both for discrete time and continuous time. In the average-reward model, policies are sought that maximize the expected reward per step. The two formulations correspond to two different notions of optimality that have been explored in previous work on HRL: hierarchical optimality, which corresponds to the set of optimal policies in the space defined by a task hierarchy, and a weaker local model called recursive optimality. What distinguishes the two models in the average reward framework is the optimizatio