Deep Reinforcement Learning (DRL) is employed to develop autonomously
optimized and custom-designed heat-treatment processes that are both,
microstructure-sensitive and energy efficient. Different from conventional
supervised machine learning, DRL does not rely on static neural network
training from data alone, but a learning agent autonomously develops optimal
solutions, based on reward and penalty elements, with reduced or no
supervision. In our approach, a temperature-dependent Allen-Cahn model for
phase transformation is used as the environment for the DRL agent, serving as
the model world in which it gains experience and takes autonomous decisions.
The agent of the DRL algorithm is controlling the temperature of the system, as
a model furnace for heat-treatment of alloys. Microstructure goals are defined
for the agent based on the desired microstructure of the phases. After
training, the agent can generate temperature-time profiles for a variety of
initial microstructure states to reach the final desired microstructure state.
The agent's performance and the physical meaning of the heat-treatment profiles
generated are investigated in detail. In particular, the agent is capable of
controlling the temperature to reach the desired microstructure starting from a
variety of initial conditions. This capability of the agent in handling a
variety of conditions paves the way for using such an approach also for
recycling-oriented heat treatment process design where the initial composition
can vary from batch to batch, due to impurity intrusion, and also for the
design of energy-efficient heat treatments. For testing this hypothesis, an
agent without penalty on the total consumed energy is compared with one that
considers energy costs. The energy cost penalty is imposed as an additional
criterion on the agent for finding the optimal temperature-time profile