Safe Optimization of Steel Manufacturing with Reinforcement Learning

Abstract

Steel production is a complex problem, and little has been done to improve it with the usage of Reinforcement Learning techniques. Most studies focus on decomposing it into sub-problems, instead of tacking it as a whole. Research has shown promising results in the area of safe policy improvement on toy problems. These algorithms are not only computationally tractable but also do not compromise the agent's safety concerns during learning. This thesis investigates how they perform on the real-world problem of improving steel production logistics. We take a simulation of a steel plant that uses hand-crafted heuristics for scheduling tasks and model it as a Markov Decision Process. We experiment with safe policy improvement algorithms by using different baseline policies. Given problem suffers from the known ''Curse of dimensionality''. Hence, the algorithms are adjusted to cope with the fast-expanding complexity. The methods prove to learn with fewer amount of samples than exploration methods. The results are especially promising with a highly stochastic baseline policy, as then the agent has a better understanding of the large environment. The next focus is on the factored representation, which has the advantage of better utilizing the problem. However, in our setting, the algorithms become too computationally expensive.Computer Science | Data Science and Technolog

    Similar works

    Full text

    thumbnail-image

    Available Versions