Reinforcement Learning (RL) can be used to fit a mapping from patient state
to a medication regimen. Prior studies have used deterministic and value-based
tabular learning to learn a propofol dose from an observed anesthetic state.
Deep RL replaces the table with a deep neural network and has been used to
learn medication regimens from registry databases. Here we perform the first
application of deep RL to closed-loop control of anesthetic dosing in a
simulated environment. We use the cross-entropy method to train a deep neural
network to map an observed anesthetic state to a probability of infusing a
fixed propofol dosage. During testing, we implement a deterministic policy that
transforms the probability of infusion to a continuous infusion rate. The model
is trained and tested on simulated pharmacokinetic/pharmacodynamic models with
randomized parameters to ensure robustness to patient variability. The deep RL
agent significantly outperformed a proportional-integral-derivative controller
(median absolute performance error 1.7% +/- 0.6 and 3.4% +/- 1.2). Modeling
continuous input variables instead of a table affords more robust pattern
recognition and utilizes our prior domain knowledge. Deep RL learned a smooth
policy with a natural interpretation to data scientists and anesthesia care
providers alike.Comment: International Conference on Artificial Intelligence in Medicine 202